Upload
david
View
212
Download
0
Embed Size (px)
Citation preview
TITLE PAGE
The near-symmetry of proteins
Proteins symmetry
Maayan Bonjack and David Avnir*
Institute of Chemistry and The Lise Meitner Minerva Center for Computational Quantum
Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
Key words: symmetry measure, CSM, rotational symmetry, symmetric proteins, symmetry
maps, near-symmetry.
*Correspondence to: David Avnir.
Institute of Chemistry, The Hebrew University, The Edmond J. Safra Campus - Givat Ram,
Jerusalem, IL 9190401.
E-mail: [email protected]
Research Article Proteins: Structure, Function and BioinformaticsDOI 10.1002/prot.24706
This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as an‘Accepted Article’, doi: 10.1002/prot.24706© 2014 Wiley Periodicals, Inc.Received: Aug 14, 2014; Revised: Oct 06, 2014; Accepted: Oct 18, 2014
2
ABSTRACT
The majority of protein oligomers form clusters which are nearly symmetric. Understanding of
that imperfection, its origins, and perhaps also its advantages requires the conversion of the
currently used vague qualitative descriptive language of the near-symmetry into an accurate
quantitative measure that will allow to answer questions such as: 'What is the degree of
symmetry deviation of the protein?', 'how do these deviations compare within a family of
proteins?', and so on. We developed quantitative methods to answer this type of questions,
which are capable of analyzing the whole protein, its backbone or selected portions of it, down
to comparison of symmetry-related specific amino-acids, and which are capable of visualizing
the various levels of symmetry deviations in the form of symmetry maps. We have applied
these methods on an extensive list of homomers and heteromers and found that apparently all
proteins never reach perfect symmetry. Strikingly, even homomeric protein clusters are never
ideally symmetric. We also found that the main burden of symmetry distortion is on the amino-
acids near the symmetry axis; that it is mainly the more hydrophilic amino-acids that take place
in symmetry-distortive interactions; and more. The remarkable ability of heteromers to
preserve near-symmetry, despite the different sequences, was also shown and analyzed. The
comprehensive literature on the suggested advantages symmetric oligomerizations raises a yet-
unsolved key question: If symmetry is so advantageous, why do proteins stop shy of perfect
symmetry? Some tentative answers to be tested in further studies are suggested in a concluding
outlook.
Page 2 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
3
INTRODUCTION
The majority of oligomeric proteins appear as clusters of sub-units which are symmetric
at least to some degree1–7. The symmetry point-groups of these clusters are chiral (because the
building blocks are chiral8), and the most prevalent ones are the cyclic Cn and the dihedral Dn
symmetries1,5. Proteins symmetry research continuous to be a central theme in structural
biochemistry, and some recent examples are collected in Ref. 9. Understanding the abundance
of this structural feature – symmetry – has received much attention, because at first glance it is
not clear why this feature is important for proteins function, that is, why has symmetric
clustering evolved at all. It has been proposed1–5 that symmetric oligomerization contributes to
an increase in protein stability because interactions between subunits are maximized; that it
contributes to an unwanted excessive aggregation; that it enhances the coding efficiency,
because less information is needed when the subunits are identical; that it reduces synthetic
errors, because as the size of the protein unit grows, the more prone it is to such errors; and that
it contributes to more efficient cooperative regulation, because protein with several identical
subunits may display allosteric effects.
Interestingly, despite of these key advantages, the vast majority of protein clusters fall
short of perfect symmetry1–4,7,10. This appears not only in clusters where the subunits are not
identical (hetero-oligomers), but even in homo-oligomers. The settling for imperfect symmetry
in clusters built by even the same units is a key phenomenon which, to the best of our
knowledge, still awaits detailed quantification, exploration of the structural origins of this
imperfection, and exploration of possible advantageous of not reaching the ideal symmetry.
We believe that understanding of these issues can gain from replacing the current qualitative
language that has been used – “quasi-symmetry”, “pseudo-symmetry”, “approximate
symmetry”, “near-symmetry”, etc.1–4,7,10 – by quantitative descriptors of symmetry that will
allow new-type of questions such as: how much of a given symmetry is there in a protein
Page 3 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
4
cluster?; by how much is one cluster more symmetric than the other?; what is the distribution
of symmetry values within a family of related proteins?; how did the symmetry content change
with evolution?; which portions of a protein carry most of the burden of the symmetry
deviation?; what is the distribution of symmetry distortion within the protein?; and so on. The
use of quantitative descriptors of symmetry – the Continuous Symmetry Measure (CSM)) – has
already proven very useful in identifying and understanding a host of symmetry-related
phenomena11–14, and some encouraging indications for its usefulness in protein structure
analyses exist. For instance, Keinan et al15 have shown a correlation between the inhibition
activity of HIV protease and the changes that the near C2-symmetry of the inhibitors undergoes
upon complexation at the active site. Also of relevance for this introduction is the modeling
work of Andre et al which quantified the symmetry of various modeled assemblies of proteins2.
The CSM quantitative symmetry analysis of proteins suffered for two decades from the
magnitude of the problem: The large number of atoms with all of their potential permutations
(a key step in the calculations) has drastically limited the ability to apply CSM analyses to
proteins. The computational problem was recently solved11, decreasing the dependence of the
computation time on the number of atoms, N, from N! to N2; this has opened the gates for a
detailed symmetry analysis of proteins. Here we describe the development of different
variations on the basic tool to quantify and analyze the symmetry content of protein oligomers,
tailored to the various analysis needs: Symmetry analyses of the whole protein or of selected
fractions of it, using either all atoms or only the backbone atoms, and the formation of
symmetry maps that reveal at a glance the deviation degree of various regions from perfect
symmetry. Using these methods we show - on a quantitative level - that very often the burden
of deviation from symmetry lies near the symmetry axis and in the border regions of the
clustered oligomers, and propose why this should be so; we find that the hydrophilic amino-
acids are more likely to carry conformational symmetry distortion; and more.
Page 4 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
5
METHODS
The computational tools
The Continuous Symmetry Measure (CSM) 16,17 is a method for quantifying the degree
of symmetry of a given object. According to the CSM approach, the G-symmetry point group
content of an object is the minimal distance between two objects: an original structure, Q�, and a
G-symmetric structure, �����, which consists of the same atoms and connectivity and is the
closest to the original distorted structure. This minimal distance of the object's vertices from
the desired G-symmetry is defined by
(1) �� = ���� ∑ ���� − �����������
where ��� are the coordinates of the ith atom of the original studied molecule, ������ are the
coordinates of the ith atom of the nearest structure which has the desired symmetry, the
denominator is the root mean square size normalization factor of the original centered structure
(� = �∑ |���|���� ), and N is the number of analyzed atoms in the structure (see Supporting
Information, Fig. S1, for additional details). It should be emphasized that this measure is
inherently different than the rmsd analyses of the degree of similarity – these tools do not
evaluate the symmetry itself as a structural parameter, which is the key issue of this report. The
range of the symmetry scale is 0 ≤ �� ≤ 1 and is expanded by a factor of 100 for
convenience (0 ≤ �� ≤ 100). If a structure is of perfect G-symmetry, then S(G)=0 and as
the structure distorts from the perfect symmetry, S(G) increases. The maximal value of 100 is
obtained when the nearest structure with the nearest G-symmetry reduces to a single point (for
instance, when one wishes to find the degree of inversion of HCl). When G is the symmetry
point group Cn, the maximal possible value is 50 because the nearest structure collapses to a
line in space (for instance, when one wishes to find the degree of pentagonality of a tetragon).
S(G) is a global parameter, and therefore allows the comparison of various structures and
Page 5 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
6
various symmetries on the same scale. Eq. (1) is a special distance function in that the target
structure is not known but is searched. This involves several minimizations which are carried
out both analytically and numerically (see Ref.'s 16,18,19 for details). As mentioned above, the
most time-consuming minimization is over the possible permutations between the atoms of the
studied structure and the atoms of the target symmetric structure (which is a mathematical
construct, not necessarily a possible molecule) in search for the optimal permutation, namely
the one that minimizes S(G). Ref. 11 describes in detail how this problem is solved, resulting
calculation times which scale reasonably with N2. Typically, computing the S(G) value of a
protein of 70 KDalton takes less than five minutes on an Intel Core i5-3470 @ 3.20 GHz
processor. The majority of our analyses concentrate on the most abundant oligomeric near-
symmetry, namely of Cn. Few examples use the less abundant point-group symmetry Dn, using
a recently developed computational tool developed for that symmetry20.
The analyzed proteins data
The selection of proteins for analysis with near Cn- or Dn-symmetries was based on
direct literature search. The selection criterion of the proteins was the explicit qualitative
description of symmetry or of near-symmetry, approximate symmetry, etc. in
manuscripts1,3,4,10,21-24 and in the proteins' original crystallographic reports. The homodimers
which are mentioned in the last section of the "results and discussion" were selected from the
database "3D Complex: a structural classification of protein complexes"25, and here the
selection criterion was the mentioning of C2 symmetry. The coordinates of the analyzed
proteins (Q� in eq. (1)) were taken from the crystallographic Protein Data Bank (PDB)26. We did
not use any data which was derived by assuming complete symmetry (these structures are by
definition of S(G)=0 value); the only crystallographic asymmetric units taken contain the full
oligomer. Water molecules were removed, as well as non-protein components such as
Page 6 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
7
phosphate, magnesium or glycerol. Cofactors were included if their number reflected the
protein near-cyclic point group. The number of atoms in each of the subunits might not be
equivalent, either because the protein is a hetero-oligomer, or because the crystallographic data
is missing information. Computational methods of addressing these situations are described
below. Regarding the quality of the analyzed data, we followed the recommendations of
Wlodawer et al27 in selecting most of the structures analyzed below; that is, we chose structures
with a resolution of less than 3.0Å and with an R factor of less than 0.25. In few cases we used
for explanatory purposes structures of lower resolution if the specific structure was mentioned
as symmetrical in the original report. We estimated the error of the CSM values by using the
crystallographic B factor from the given crystallographic data of the structure. We then
translated B factor into the U factor and checked how does U, as well as the number of
collected reflections, affect the confidence in the coordinates of the structure. The rms of the
errors in the coordinates was then translated into a statistical estimation of the symmetry
measure and revealed the error associated with the determining degree of symmetry from X-
ray (see Ref. 28 for full equations and more details). Since in general, these estimations
revealed an insignificant error, CSM values are introduced here without it.
RESULTS AND DISCUSSION
Developing various options for the near-symmetry analysis of proteins
The symmetry analysis of proteins, homo- as hetero-oligomers, can be carried out at
various degrees of resolutions, depending on the specific investigated questions at hand: It can
be carried out with either the full set of atoms or a sub-set of it, or with fragments of the
protein. Consequently, our first task has been to develop computational options for four main
categories as follows:
Page 7 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
8
a. Backbone symmetry analysis of the whole protein oligomer: This analysis takes into
account only the coordinates of atoms within the polypeptide backbone, without the residues
(Fig. 1b,c). The method enables the user to have a general sense of the symmetry of the protein
- it is analogous to the various basic cartoon presentations of proteins. This method is
particularly relevant for heteromeric proteins, because of the remarkable property that
differences in sequences still preserve the main structural features29, including the symmetry
deviation level. In fact, the symmetry in such heteromeric-proteins may reflect the existence of
a homomeric-symmetric ancestor oligomer which later-on specialized by becoming
heteromeric21. By focusing on the polypeptide backbone one can perform comparative analyses
all across the border and include both homo- and hetero- oligomers. Because of its generality
we recommend this computational tool at least as a first-approach step.
b. All-atoms symmetry analysis of whole protein oligomers: A more detailed symmetry-
analysis includes the whole structure of the protein, that is, the backbone and residues (usually
without the hydrogens, Fig. 1a). This higher-resolution full analysis may be applied only on
homomeric proteins, because in this case the identity of the amino acids is an essential part of
the analyzed information (applying this analysis on heteromeric proteins, where the units are
not identical, would lead to an unreasonable permutations and thus to extremely high CSM
values).
c. Symmetry analysis of fragments: It is quite often in proteins structural analyses to focus on
selected fragments within the full set of data. Furthermore, it is obvious that CSM values of the
whole protein hide more specific information on the origin of the symmetry deviation of the
whole. For instance, it might be illuminating to know which regions in the protein carry the
burden of the deviation, and which are barely affected by it. This analysis can be done with the
two options described above: The fragment can be of a backbone, or of a full set of atoms. For
Page 8 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
9
instance, one could compare the backbone symmetry of fragments of homomeric oligomer
with fragments of analogous heteromeric oligomer.
This option of local symmetry analysis opens also the possibility to analyze quasi-symmetry
related portions within monomeric proteins or within a single subunit of an oligomer. In fact,
these fragments can be as small as one may wish to have, down to symmetry-matched amino
acids, specific carbonyl groups, and even as small as symmetry-equivalent atoms. For instance,
in a near-C3 trimeric-oligomer one could choose a specific carbonyl-bearing carbon and its two
closest equivalent carbons obtained by that operation, and compute S(C3) for these three atoms.
d. Symmetry maps: Continuing with the last example, one can in fact repeat this procedure for
another set of three atoms, and so on for a set of atoms as large as one wish to have. For all
practical purposes, it is not needed to go to resolutions as high as single atoms - we found that
the relevant minimal, useful unit is, as expected, the amino-acid. The collection of all of the
local CSM values, displayed with a color-code in space, provides then a symmetry-map of the
protein or of portions of it – see Fig. 2. This analysis provides a telltale visualization, which
gives at a glance the relative deviations from symmetry within the oligomer structure.
The construction of symmetry maps is carried out as follows: First, the protein is
divided into sets of symmetry-related amino-acids; each set consists of one amino-acid from
each subunit. When the division is completed, the number of the sets is identical to the number
of amino-acids in the subunit. Second, a CSM calculation is carried out for each set. A
graphical decision is then made as to the color code of amino acids are considered as having a
very-high, high, medium, low and very-low CSM values.
Backbone symmetry analysis of the whole protein oligomer
a. Homomers: We begin by applying the backbone symmetry analysis of the whole protein on
homomeric proteins with near Cn or Dn symmetry. The conclusion to be reached in this section
Page 9 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
10
is not trivial and quite surprising: All of the homo-oligomeric proteins we analyzed, even those
which are declared "symmetric" have S(Cn)≠0 values (beyond the error range – see Section
"the analyzed proteins data"); that is, despite the identical composition of the units, the proteins
"select" to stop short of perfect symmetry. The symmetry deviations are small, but enough to
let the literature use qualitative descriptors such as "pseudo" and "approximate". Consider for
instance the structure of human hemoglobin, which is composed of four subunits, arranged as a
dimer of α and β subunits (namely α2β2), which is commonly described as possessing "pseudo-
C2 symmetry"30 (Fig. 3e). The analyzed protein backbone in its T (deoxy) quaternary structure
has a CSM value of S(C2)=0.004. This is a small symmetry distortion (in the following sections
we encounter much larger values) but definitely justifies the qualitative "pseudo" descriptor
used in the literature. Other homomeric proteins were analyzed similarly, and the results are
collected in Table I and in Fig. 3 (see Supporting Information, Table S1 for additional analyzed
proteins). It is seen that none of the proteins is perfectly symmetric. The various S(Cn) values
point to an optimal symmetry deviation at which each protein settles, and it is seen that these
optima are not the same. We recall that the CSM scale is unified and comparable for all
symmetries. That is, any two proteins, regardless of their symmetry point group, can be
compared according to their CSM values. For instance, according to Table I, hemoglobin
(S(C2)=0.004) is more symmetric than purine nucleoside phosphorylase (S(C3)=0.005).
We found the same universality for near Dn-symmetric proteins (Table I and Fig. 3). In
fact, Dn analysis allows one also to compare its two chiral rotational sub-groups, Cn and the
perpendicular C2. For each of those proteins we present the CSM values for Dn, Cn and C2
(Table I and Fig. 3). It is seen that the approximate symmetry shows up also in the two sub-
group symmetries. The relative magnitude of the CSM values is in agreement with the
expected relation S(Dn)≥S(Cn)≥S(C2). This is so, because it follows the order of the number of
Page 10 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
11
elements in each of these symmetry groups: The more elements, the more distant is the nearest
fully symmetric structure.
b. Heteromers: We move on with the backbone analysis to heteromeric proteins. Here there are
basically two general cases:
1. Same length – different sequence: All of the heteromeric units are of the same backbone
length and differ only in their sequences. The analysis of this case is carried out exactly as was
done with the backbone of the homomeric proteins, and examples are provided in Fig. 4 and
Table II. For instance, the heterodimer of disintegrin (a protein from the venom of Echis
carinatus that binds to integrins receptors) is composed of two monomers which have the same
sequence length, and the sequence identity is 84% (Fig. 4a)31. The crystal structure is described
as having a high level of “pseudosymmetry”, which translates quantitatively to a CSM value of
S(C2)=0.03 for the backbone.
2. Different length units: The second case is of proteins in which some or all subunits have
different lengths. Analysis of this case requires additional data processing. Consider, for
instance, the protein 20S proteasome from the archaebacterium Thermoplasma acidophilum
(which is part of a proteasome complex that fills an important role in the degradation of
unneeded or damaged proteins)32. It is composed of four rings, one on top of the other (Fig.
5a), each of which is built from seven subunits (Fig. 5b). As seen in Fig. 5b, the near C7
symmetry is quite clear, and this is so although each subunit has a unique amino acid sequence
and length (the length of the seven subunits is of 243, 250, 244, 241, 242, 233 and 244 amino-
acids). The question then is how to handle that situation which is quite common with
heteromeric proteins? If no correction is made for the different lengths the CSM value of the
structure is extremely high, S(C7)=49 (because of lack of proper permutations between the
atoms of the oligomer units). That value basically means that there is no C7-symmetry in the
Page 11 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
12
protein 20S proteasome whatsoever, which defies what the eye sees. What is it then that the
eye sees? That the bulk of the backbone atoms are indeed nearly C7-symmetric. Translating it
into practice requires either ignoring the extra atoms with no equivalent symmetry-related
atoms (the CSM is then S(C7)=0.2, Fig. 5c); or - following Frid et al33 idea in analyzing the
symmetry of branched structures - adding ghost atoms at the missing parts of the shorter
subunits (at the C-terminus of the amino acid); in this case S(C7)=0.5 (Fig. 5d). Each of these
two options provides proper permutations and CSM values which translate what the eye sees
properly. While the first method ignores atoms, and thus reflects the symmetric core of the
oligomer, the second method takes the intrinsic asymmetry of the whole protein structure into
consideration; we thus recommend using the second method.
All-atoms symmetry analyses of whole homomers
Contrary to the backbone analysis, all-atoms analyses take into account many more
conformational distortions, due to inclusion of the amino acid residues. This residue
conformational freedom is expected to increase the CSM values compared to the corresponding
backbone CSM values. Indeed, returning to hemoglobin for which we obtained S(C2)=0.004
for the backbone, the symmetry measure increases 2.5-fold to S(C2)=0.01 (Table I) when the
full protein is analyzed. As seen in Table I, this trend is true for all of the analyzed proteins.
One should note that the larger CSM values are not the result of the larger number of atoms
considered, due to the normalizations according to the number of atoms and size. However, the
magnitude of the difference in the CSM values between the full- and backbone- analyses show
that the residues contribute to overall distortion are much more than the backbone.
Symmetry analyses of fragments
Page 12 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
13
Quite often, one may wish to concentrate on specific portions of the protein (portions
which are symmetry-related in each of the homo- or hetero-oligomer units, or even in a
monomeric protein structure), rather than on the whole. Such local focus - for instance, α-
helices, border zones between the oligomer units or specific domains in the structure - provides
higher resolution information compared to the previous sections. That is, as they are not
masked by portions which are of less relevance to the question at hand, or which carry only
little of the symmetry distortion. Consider, for example, the homodimer of human β1 alcohol
dehydrogenase (a protein that participates in the metabolism of ethanol and other substrates,
Fig. 6a)30. The protein consists of α-helixes and β-sheets, and the question we ask is, where are
the higher symmetry distortions? Calculations of S(C2) of the β-sheets and of the α-helixes
separately provide values of S(C2)=0.002 and S(C2)=0.003, respectively. That is, the α-helixes
tend to be more distorted than β-sheets. Yet, since the global CSM value is much higher
(S(C2)=0.01), the burden of the protein distortion must be elsewhere, in less rigid zones.
Indeed, analysis of connecting segments of the rigid secondary structures was calculated and
found to be relatively high- S(C2)=0.02. This is general: We observed this trend in many other
proteins as well: rigid segments have lower CSM values, compared to flexible ones.
As mentioned above, local symmetry analysis opens also the possibility to analyze
quasi-symmetry related portions within monomeric protein or within a subunit of an oligomer.
An example for this case is the protein γβ-crystallin (an eye lens protein), which is referred to
as a highly symmetrical structure34, despite the different sequences of the components. It
consists of two domains; each of them contains two structures of the 'Greek key' motifs (Fig.
6b). The two domains are related by an approximate 2-fold symmetry, as described in
literature35. However, the domains differ in their sequence and length. Therefore, addition of
"ghost" atoms is required, as described in Section "backbone symmetry analysis of the whole
protein oligomer". Calculation of the CSM value was carried out only on the relevant motifs,
Page 13 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
14
and the linking segments which add a lot to the symmetry deviation, as we saw above, were
ignored. The resulting CSM value is S(C2)=0.20, and it indicates that the two domains are
largely related by C2 symmetry, although the sequence and length is different.
Symmetry maps
a. Symmetry maps of oligomers: The symmetry map is a high resolution local analysis which
determines the symmetry content of symmetry-matched amino acids within the oligomer. The
analysis produces symmetry maps of the kind shown in Fig. 2 for the near C2-symmetric dimer
of HIV-1 protease30 (Fig. 3m, an aspartic protease; we recall that the near C2-symmetry of the
dimer gave rise to the development of C2-related anti-HIV drugs36). The CSM value of the
protein in the full analysis was found to be (above) S(C2)=0.13; let us now see how this
translates to the symmetry map of Fig. 2. Literature points specifically at two amino-acids in
position 50 and 51 (Ile50 and Gly51 in each subunit) that are a major source of symmetry
distortion conformation. Indeed, the CSM value of Ile50 is extremely high, S(C2)=4.64. The
CSM value of Gly51 is lower, S(C2)=0.21, but nevertheless it is still higher than the CSM value
of the full protein. Applying a four colors-code map not only confirms that those amino-acids
break the symmetry of the protein, but also reveals other amino acids which cause symmetry
deviations. The most distorted amino acids (in this map, 1<CSM<5) are colored red and
include Ile50, Thr96 and Asn98. The less distorted amino acids (in this map, 0.5<CSM<1) are
colored orange and include Gly49, Ile66, Ile84, Leu97. Yellow is the next color code (in this
map, 0.2<CSM<0.5) and includes Gln2, Ile15, Leu23, Thr26, Glu35, Lys41, Lys45, Gly51,
Ala95; and the green-colored amino acids (in this map, 0.1<CSM<0.2) are Pro1, Gln7, Arg14,
Glu21, Asp25, Glu34, Ile47, Lys55, His69, Lys70, Ile72 and Lys89.
Significantly, the symmetry map (Fig. 2) indicates that most of the symmetry distorted
symmetry-matched pairs of amino-acids are located in the area where the two units coalesce
Page 14 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
15
into one, and along the near Cn symmetry axis (in dimers, those two elements are of the same
region in the protein). This again is general: we shall see it repeatedly in more case analyses
(Fig. 7). The area where the components of the oligomer interact is indeed expected to be
active in symmetry distortion. Not only that when two similar sub-units interact there is no
reason to expect a similar distortion upon interaction, the contrary is true - conformational
changes that lead to the optimal interaction often would require the two components to adjust
to each other, which would mean that each subunit will acquire a different conformer. For
example, the loop in position 49-51 (bottom of Fig. 2) is very distorted (colored red, orange
and yellow), and this is due to the non-equivalent conformers of Ile50 and Gly51 that are
needed in order to form a direct hydrogen bond between them (dashed line, Fig. 2). The
mechanical approach to proteins structure would suggest that the distortions at the units
interacting zone would affect also more remote zones of the protein. This is also seen in Fig. 2,
as regions that are distant from the interacting zone are distorted as well, although to a lesser
degree (colored green).
Additional symmetry-maps are shown in Fig. 737. In most symmetry maps not only is
there a quantitative and visual reporting of original qualitative statements in literature about
symmetry-related distorted amino acids, but also indications for other distorted regions, that
were not mentioned in the original reports, as seen above in HIV-1 protease.
The various symmetry maps show some main trends: First, the most distorted amino-
acids, which are indicated by the red- and orange-colored areas, are located near the rotation
axis and between the touching zones of the subunits, as was mentioned above. Second,
symmetry related amino acids, located in any type of secondary structures - rigid as well as
flexible (i.e. α-helixes or loops), can diverge from perfect symmetry relation (this is in
agreement with the finding mentioned above that rigid regions, as a whole, tend to be more
symmetric than flexible ones). Third, the less distorted green-colored regions may be present
Page 15 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
16
all over the protein structure, including the protein-solvent interface. Fourth, it was found that
there is no decaying gradient in symmetry measure from the most symmetry distorted regions
into their near surroundings. However, as was mentioned above, the distortions do have the
potential to affect distant regions.
b. Comparative symmetry maps of different proteins: Another example of the usefulness of
symmetry maps analysis of proteins is the comparison of the symmetry of different proteins
which have only partial structure and sequence similarity. Our example is the two
homodimeric-proteins, 11 β-hydroxysteroid dehydrogenase Type 1 (11 β-HSD1) of a guinea
pig and of a murine38, which have a sequence homology of 70% and possess a similar
structure; both are referred to in the literature as having a pseudo-2-fold axis (Fig. 8). This
"pseudo-symmetry" translates into global CSM value of S(C2)=0.019 and S(C2)=0.024 for the
murine and the guinea pig, respectively, which in fact is a very interesting result in itself:
Despite the fact that the sequential similarity is only 70%, the two proteins are with very
similar CSM values, which might show their structural and functional similarity. This agrees
with the identified evolutionary trend29 by which keeping the trait of a specific structure along
evolution is more important than keeping the sequence, signaling that activity is primarily
dependent on the quaternary structure, and indicating the ability to use different building-
blocks (the amino-acids) to form similar structures. Let us now look at these proteins at a
higher resolution (Fig. 8), to find local regions which are of much higher S(C2) values (up to
2.8). An interesting observation made possible by this type of analysis is that the two proteins
share the same amino-acids with high CSM values, each in its C2-symmetry-related subunits in
the protein. Out of the 20 most distorted amino-acids in each protein, 7 amino-acids are shared
(Glu26, His134, Arg137, Ile203, Glu239, Asn270, Lys274). Note that in this case too,
significant parts of the distortion are located near the C2-symmetry-axis.
Page 16 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
17
Which amino-acids carry the burden of symmetry distortion?
We use now the symmetry maps to answer this key question, and for that purpose we
have concentrated on homodimeric proteins, (which, as mentioned above, are classified as
having C2-symmetry in the database of Ref. 25). We have applied the symmetry analysis of
fragments on each of these proteins, and chose the minimal fragments in the analysis as the
individual amino-acids. That is, each CSM calculation was consisted of a pair of two amino-
acids, one in each of the two subunits, which are symmetry-related (as in the symmetry maps).
We then searched for the most distortive ones, and focused on the 15 most symmetry-distorted
C2-symmetry-related pairs of amino acids, in search of what might characterize the symmetry-
distorted pairs. After searching various possible parameters (such as polarity, acidity, etc.) a
very clear link was found with the hydrophilicity/hydrophobicity balance of the amino-acids
(as defined in Ref. 39): It was found that the majority of distorted 15 residues are hydrophilic,
as can be seen in Table III and in Fig. 940. This finding means, perhaps as expected, that a main
source of structure re-adjustment is due to multiple polar asymmetric interactions of hydrogen-
bonds between amino-acids or between an amino-acid and water molecules. These interactions
enable the forcing of conformation changes of the amino-acids in order to get to a preferred
position, as was already discussed above regarding the HIV protease. On the other hand,
hydrophobic residues are engaged in weaker dispersive interactions, which are less demanding
and therefore less distortive. This observation leads us to our final topic of conceptual
discussion:
CONCLUSIONS
The shift from qualitative descriptions of symmetry deviations of proteins into
quantitative descriptions, as proposed in this report, proved to be an eye opener: Not only does
it translate the descriptive language to a meaningful scale, it allows one to obtain a clear picture
of the molecular origin of the asymmetry. Thus, we found that segments located near the
Page 17 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
18
symmetry-axis and in between the subunits consistently carry part of the burden of symmetry
deviations; and we found that hydrophilic amino acids tend to carry the burden of asymmetry
by adopting different conformers relative to their symmetry-related amino acids in the protein
structure. The strength, as well as the weakness, of the method is that it is a thermodynamic
measure: One cannot go back from it to specific structural features, but it does combine such
various features into one value.
Such observations bring us back to the question at the opening of the report: If
symmetry is so advantageous1–5, why do proteins stop shy of it? We propose the following list
of reasons for this phenomenon which is quite universal in the world of proteins. All of the
following must be tested in future studies and should be regarded as a starting point for
discussion: The first important parameter is the free energy, that is, the enthalpic and entropic
parameters. Enthalpy changes must follow the interactions and conformational readjustments
optimizing the interfacial contact area of the subunits: “insisting” on perfect symmetry would
not minimize the protein's enthalpy, and thus would not minimize its free energy. We also
propose that entropy has a great influence on the symmetry distortion of the protein: A fully-
symmetric structure of oligomer has a certain number of possible microscopic conformations
of the protein. However, as the protein deviates from its perfect symmetric structure, that
number increases sharply. That is, the entropy content rises, probably quite steeply, as the
protein settles for symmetry that is less than perfect. Second, we consider dynamics as another
cause for asymmetry, since proteins are not static structures, but are in a constant motion and
thus have ensembles of conformations. As dynamics moves any symmetric object away from
symmetry (for most of its conformers), the protein necessarily distorts from perfect symmetry.
Using the CSM tool, a distortion is always a non-negative value, S(G)≥0, namely the distance
from perfect symmetry. It follows that the average CSM value of the protein's conformers is
always S(G)≥0 as well, because an average of positive values is also positive. It is a delicate
Page 18 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
19
point that while the geometric average of distorted structures can be perfectly symmetric, the
average symmetry measure is still non-zero. The explanation: The average geometry is the
most symmetric structure because the directionalities of fluctuations tend to compensate each
other (they are vectorial entities and can have negative values); symmetry deviation values on
the other hand are always positive numbers. Last but not least, one must take into account the
surroundings: proteins are always in an asymmetric environment – certainly in the cell, but
even in the crystalline state. These asymmetric surroundings do not induce asymmetry to the
protein structure, but stabilize it: the hydration shell stabilizes the structure by forming mainly
hydrogen bond interactions. As was discussed before, hydrogen interactions encourage
asymmetry.
We hope this report will stimulate further exploration of the effects of these parameters
and of their relative weight in determining proteins symmetry and their symmetry-related
functionality; research in this direction is in progress.
ACKNOWLEDGEMENTS
We thank Dr. Chaim Dryzun for useful advice and Amir Zait for development and assistance
of the computational tools.
Page 19 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
20
REFERENCES
1. Goodsell DS, Olson AJ. Structural symmetry and protein function. Annu Rev Biophys
Biomol Struct 2000;29:105–153.
2. André I, Strauss CEM, Kaplan DB, Bradley P, Baker D. Emergence of symmetry in
homooligomeric biological assemblies. Proc Natl Acad Sci USA 2008;105:16148–
16152.
3. Blundell TL, Srinivasan N. Symmetry, stability, and dynamics of multidomain and
multicomponent protein systems. Proc Natl Acad Sci USA 1996;93:14243–14248.
4. Kojić-Prodić B, Štefanić Z. Symmetry versus Asymmetry in the Molecules of Life:
Homomeric Protein Assemblies. Symmetry 2010;2:884–906.
5. Berchanski A, Segal D, Eisenstein M. Modeling oligomers with Cn or Dn symmetry:
application to CAPRI target 10. Proteins 2005;60:202–206.
6. Taylor WR, May ACW, Brown NP, Aszódi A. Protein structure: geometry, topology
and classification. Rep Prog Phys 2001;64:517–590.
7. Taylor WR, Aszodi A. Protein geometry, classification, topology and symmetry: A
computational analysis of structure (Series in Biophysics). Taylor & Francis; 2004.
8. Alvarez S, Serratosa F. Symmetry guidelines for the design of convergent syntheses. On
narcissistic coupling and la coupe du roi. J Am Chem Soc 1992;114:2623–2630.
9. (a) Abad-Zapatero C. Notes of a protein crystallographer: the beauty of rose windows
and the different meanings of symmetry. Acta Crystallogr D Biol Crystallogr
2014;70:907–911. (b) Schulze B, Sljoka A, Whiteley W. How does symmetry impact
the flexibility of proteins? Philos Trans A Math Phys Eng Sci 2014;372:20120041. (c)
Reisinger B, Kuzmanovic N, Löffler P, Merkl R, König B, Sterner R. Exploiting protein
symmetry to design light-controllable enzyme inhibitors. Angew Chem Int Ed Engl
2014;53:595–598. (d) Williams LS, Levdikov VM, Minakhin L, Severinov K, Antson
AA. 12-Fold symmetry of the putative portal protein from the Thermus thermophilus
bacteriophage G20C determined by X-ray analysis. Acta Crystallogr Sect F Struct Biol
Cryst Commun 2013;69:1239–1241. (e) Hilbert M, Erat MC, Hachet V, Guichard P,
Blank ID, Flückiger I, Slater L, Lowe ED, Hatzopoulos GN, Steinmetz MO, Gönczy P,
Vakonakis I. Caenorhabditis elegans centriolar protein SAS-6 forms a spiral that is
consistent with imparting a ninefold symmetry. Proc Natl Acad Sci U S A
2013;110:11373–11378. (f) Matsunaga Y, Koike R, Ota M, Tame JRH, Kidera A.
Influence of structural symmetry on protein dynamics. PLoS One 2012;7:e50011. (g)
Page 20 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
21
Bayfield OW, Chen C-S, Patterson AR, Luan W, Smits C, Gollnick P, Antson AA. Trp
RNA-binding attenuation protein: modifying symmetry and stability of a circular
oligomer. PLoS One 2012;7:e44309. (h) Gross M. Symmetry and complexity in protein
oligomers. Curr Biol 2012;22:R175–R177. (i) Nava J, Kreinovich V. Towards
symmetry-based explanation of (approximate) shapes of alpha-helices and beta-sheets
(and beta-barrels) in protein structure. Symmetry (Basel) 2012;4:15–25. (j) Broom A,
Doxey AC, Lobsanov YD, Berthin LG, Rose DR, Howell PL, McConkey BJ, Meiering
EM. Modular evolution and the origins of symmetry: reconstruction of a three-fold
symmetric globular protein. Structure 2012;20:161–171.
10. Levy Y, Cho SS, Shen T, Onuchic JN, Wolynes PG. Symmetry and frustration in
protein energy landscapes: a near degeneracy resolves the Rop dimer-folding mystery.
Proc Natl Acad Sci U S A 2005;102:2373–2378.
11. Dryzun C, Zait A, Avnir D. Quantitative symmetry and chirality-a fast computational
algorithm for large structures: proteins, macromolecules, nanotubes, and unit cells. J
Comput Chem 2011;32:2526–2538.
12. Tuvi-Arad I, Blonder R. Continuous symmetry and chemistry teachers: learning
advanced chemistry content through novel visualization tools. Chem Educ Res Pract
2010;11:48-58.
13. Dryzun C. Continuous symmetry measures for complex symmetry group. J Comput
Chem 2014;35:748–755.
14. Tuvi-Arad I, Rozgonyi T, Stirling A. Effect of temperature and substitution on Cope
rearrangement: a symmetry perspective. J Phys Chem A 2013;117:12726–12733.
15. Keinan S, Avnir D. Quantitative symmetry in structure−activity correlations: The near
C2 symmetry of inhibitor/HIV protease complexes. J Am Chem Soc 2000;122:4378–
4384.
16. Zabrodsky H, Peleg S, Avnir D. Continuous symmetry measures. J Am Chem Soc
1992;114:7843–7851.
17. Dryzun C, Avnir D. Generalization of the continuous symmetry measure: the symmetry
of vectors, matrices, operators and functions. Phys Chem Chem Phys 2009;11:9653–
9666.
18. Salomon Y, Avnir D. Continuous symmetry measures: A note in proof of the
folding/unfolding method. J Math Chem 1999;25:295–308.
19. Pinsky M, Avnir D. Continuous Symmetry Measures. 5. The Classical Polyhedra. Inorg
Chem 1998;37:5575–5582.
Page 21 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
22
20. Pinsky M, Zait A, Bonjack M, Avnir D. Continuous symmetry analyses: Cnv and Dn
measures of molecules, complexes, and proteins. J Comput Chem 2013;34:2–9.
21. Blundell TL, Bolanos-Garcia V, Chirgadze DY, Harmer NJ, Lo T, Pellegrini L, Sibanda
BL. Asymmetry in the multiprotein systems of molecular biology. Struct Chem
2002;13:405–412.
22. Guerler A, Wang C, Knapp EW. Symmetric structures in the universe of protein folds. J
Chem Inf Model. 2009;49:2147-2151.
23. Duran AM, Meiler J. Inverted topologies in membrane proteins: a mini-review. Comput
Struct Biotechnol J 2013;8:e201308004.
24. Popov P, Ritchie DW, Grudinin S. DockTrina: docking triangular protein trimers.
Proteins 2014;82:34-44.
25. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D complex: a structural
classification of protein complexes. PLoS Comput Biol 2006;2:e155.
26. http://www.rcsb.org/pdb/.
27. Wlodawer A, Minor W, Dauter Z, Jaskolski M. Protein crystallography for non-
crystallographers, or how to get the best (but not more) from published macromolecular
structures. FEBS J 2008;275:1–21.
28. Pinsky M, Yogev-Einot D, Avnir D. Statistical analysis of the estimation of distance
measures. J Comput Chem 2003;24:786–796.
29. Guerler A, Knapp E-W. GIS: a comprehensive source for protein structure similarities.
Nucleic Acids Res 2010;38:W46–52.
30. (a) Lou H, Chen M, Black SS, Bushell SR, Ceccarelli M, Mach T, Beis K, Low AS,
Bamford VA, Booth IR, Bayley H, Naismith JH. Altered antibiotic transport in OmpC
mutants isolated from a series of clinical strains of multi-drug resistant E. coli. PLoS
One 2011;6:e25825. (b) Hohenester E, Hutchinson WL, Pepys MB, Wood SP. Crystal
structure of a decameric complex of human serum amyloid P component with bound
dAMP. J Mol Biol 1997;269:570–578. (c) Dedeo MT, Duderstadt KE, Berger JM,
Francis MB. Nanoscale protein assemblies from a circular permutant of the tobacco
mosaic virus. Nano Lett 2010;10:181–186. (d) Hurley TD, Bosron WF, Stone CL,
Amzel LM. Structures of three human beta alcohol dehydrogenase variants. Correlations
with their functional differences. J Mol Biol 1994;239:415–429. (e) Fermi G, Perutz
MF, Shaanan B, Fourme R. The crystal structure of human deoxyhaemoglobin at 1.74 A
resolution. J Mol Biol 1984;175:159–174. (f) To be published. (g) Antson AA, Otridge
J, Brzozowski AM, Dodson EJ, Dodson GG, Wilson KS, Smith TM, Yang M, Kurecki
Page 22 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
23
T, Gollnick P. The structure of trp RNA-binding attenuation protein. Nature
1995;374:693–700. (h) Kiser PD, Lodowski DT, Palczewski K. Purification,
crystallization and structure determination of native GroEL from Escherichia coli
lacking bound potassium ions. Acta Crystallogr Sect F Struct Biol Cryst Commun
2007;63:457–461. (i) Harata K, Nagahora H, Jigami Y. X-ray structure of wheat germ
agglutinin isolectin 3. Acta Crystallogr Sect D Biol Crystallogr 1995;51:1013–1019. (j)
To be published. (k) Cherezov V, Clogston J, Papiz MZ, Caffrey M. Room to move:
crystallizing membrane proteins in swollen lipidic mesophases. J Mol Biol
2006;357:1605–1618. (l) Sixma TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J,
Witholt B, Hol WG. Refined structure of Escherichia coli heat-labile enterotoxin, a close
relative of cholera toxin. J Mol Biol 1993;230:890–918. (m) Swain AL, Miller MM,
Green J, Rich DH, Schneider J, Kent SB, Wlodawer A. X-ray crystallographic structure
of a complex between a synthetic protease of human immunodeficiency virus 1 and a
substrate-based hydroxyethylamine inhibitor. Proc Natl Acad Sci U S A 1990;87:8805–
8809. (n) Shrive AK, Cheetham GM, Holden D, Myles DA, Turnell WG, Volanakis JE,
Pepys MB, Bloomer AC, Greenhough TJ. Three dimensional structure of human C-
reactive protein. Nat Struct Biol 1996;3:346–354. (o) Tahirov TH, Inagaki E, Ohshima
N, Kitao T, Kuroishi C, Ukita Y, Takio K, Kobayashi M, Kuramitsu S, Yokoyama S,
Miyano M. Crystal structure of purine nucleoside phosphorylase from Thermus
thermophilus. J Mol Biol 2004;337:1149–1160. (p) Vitali J, Colaneri MJ, Kantrowitz E.
Crystal structure of the catalytic trimer of Methanococcus jannaschii aspartate
transcarbamoylase. Proteins 2008;71:1324–1334. (q) Gill HS, Eisenberg D. The crystal
structure of phosphinothricin in the active site of glutamine synthetase illuminates the
mechanism of enzymatic inhibition. Biochemistry 2001;40:1903–1912.
31. (a) Bilgrami S, Yadav S, Kaur P, Sharma S, Perbandt M, Betzel C, Singh TP. Crystal
structure of the disintegrin heterodimer from saw-scaled viper (Echis carinatus) at 1.9 A
resolution. Biochemistry 2005;44:11058–11066. (b) Keating AE, Malashkevich VN,
Tidor B, Kim PS. Side-chain repacking calculations for predicting structures and
stabilities of heterodimeric coiled coils. Proc Natl Acad Sci U S A 2001;98:14825–
14830. (c) Jabeen T, Singh N, Singh RK, Jasti J, Sharma S, Kaur P, Srinivasan A, Singh
TP. Crystal structure of a heterodimer of phospholipase A2 from Naja naja sagittifera at
2.3 A resolution reveals the presence of a new PLA2-like protein with a novel cys 32-
Cys 49 disulphide bridge with a bound sugar at the substrate-binding site. Proteins
2006;62:329–337.
Page 23 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
24
32. Groll M, Ditzel L, Löwe J, Stock D, Bochtler M, Bartunik HD, Huber R. Structure of
20S proteasome from yeast at 2.4 A resolution. Nature 1997;386:463–471.
33. Milner D, Raz S, Hel-Or H, Keren D, Nevo E. A new measure of symmetry and its
application to classification of bifurcating structures. Pattern Recognit 2007;40:2237–
2250.
34. Najmudin S, Nalini V, Driessen HP, Slingsby C, Blundell TL, Moss DS, Lindley PF.
Structure of the bovine eye lens protein γB(γII)-crystallin at 1.47 A. Acta Crystallogr D
Biol Crystallogr 1993;49:223–233.
35. Blundell T, Lindley P, Miller L, Moss D, Slingsby C, Tickle I, Turnell B, Wistow G.
The molecular structure and stability of the eye lens: x-ray analysis of γ-crystallin II.
Nature 1981;289:771–777.
36. Greer J, Erickson JW, Baldwin JJ, Varney MD. Application of the three-dimensional
structures of protein target molecules in structure-based drug design. J Med Chem
1994;37:1035–1054.
37. (a) Baker EN, Blundell TL, Cutfield JF, Cutfield SM, Dodson EJ, Dodson GG, Hodgkin
DM, Hubbard RE, Isaacs NW, Reynolds, CD. The structure of 2Zn pig insulin crystals
at 1.5 A resolution. Philos Trans R Soc Lond B Biol Sci 1988;319:369–456 (PDB code:
4INS). (b) Moras D, Olsen KW, Sabesan MN, Buehner M, Ford GC, Rossmann MG.
Studies of asymmetry in the three-dimensional structure of lobster D-glyceraldehyde-3-
phosphate dehydrogenase. J Biol Chem 1975;250:9137–9162 (PDB code: 1GPD). (c)
Jogl G, Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton
transfer: structure of the Michaelis complex of triosephosphate isomerase at 1.2-A
resolution. Proc Natl Acad Sci U S A 2003;100:50–55 (PDB code: 1NEY). (d) Sixma
TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J, Witholt B, Hol WG. Refined
structure of Escherichia coli heat-labile enterotoxin, a close relative of cholera toxin. J
Mol Biol 1993;230:890–918 (PDB code: 1LTS).
38. (a) Ogg D, Elleby B, Norström C, Stefansson K, Abrahmsén L, Oppermann U, Svensson
S. The crystal structure of guinea pig 11β-hydroxysteroid dehydrogenase type 1
provides a model for enzyme-lipid bilayer interactions. J Biol Chem 2005;280:3789–
3794 (PDB code: 1XSE). (b)Zhang J, Osslund TD, Plant MH, Clogston CL, Nybo RE,
Xiong F, Delaney JM, Jordan SR. Crystal structure of murine 11 beta-hydroxysteroid
dehydrogenase 1: an important therapeutic target for diabetes. Biochemistry
2005;44:6948–6957 (PDB code: 1Y5M).
Page 24 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
25
39. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface
protein sequences with the hydrophobic moment plot. J Mol Biol 1984;179:125–142.
40. (a) Chagot B, Diochot S, Pimentel C, Lazdunski M, Darbon H. Solution structure of
APETx1 from the sea anemone Anthopleura elegantissima: a new fold for an HERG
toxin. Proteins 2005;59:380–386. (b) Swan MK, Hansen T, Schönheit P, Davies C. A
novel phosphoglucose isomerase (PGI)/phosphomannose isomerase from the
crenarchaeon Pyrobaculum aerophilum is a member of the PGI superfamily: structural
evidence at 1.16-A resolution. J Biol Chem 2004;279:39838–39845. (c) Jogl G,
Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton transfer:
structure of the Michaelis complex of triosephosphate isomerase at 1.2-A resolution.
Proc Natl Acad Sci U S A 2003;100:50–55. (d) Kessler D, Rétey J, Schulz GE. Structure
and action of urocanase. J Mol Biol 2004;342:183–194. (e) Swan MK, Hansen T,
Schönheit P, Davies C. Structural basis for phosphomannose isomerase activity in
phosphoglucose isomerase from Pyrobaculum aerophilum: a subtle difference between
distantly related enzymes. Biochemistry 2004;43:14088–14095. (f) Ziółkowska NE,
O’Keefe BR, Mori T, Zhu C, Giomarelli B, Vojdani F, Palmer KE, McMahon JB,
Wlodawer A. Domain-swapped structure of the potent antiviral protein griffithsin and its
mode of carbohydrate binding. Structure 2006;14:1127–1135. (g) Willis MA, Song F,
Zhuang Z, Krajewski W, Chalamasetty VR, Reddy P, Howard A, Dunaway-Mariano D,
Herzberg O. Structure of YciI from Haemophilus influenzae (HI0828) reveals a
ferredoxin-like alpha/beta-fold with a histidine/aspartate centered catalytic site. Proteins
2005;59:648–652. (h) Adam V, Royant A, Nivière V, Molina-Heredia FP, Bourgeois D.
Structure of superoxide reductase bound to ferrocyanide and active site expansion upon
X-ray-induced photo-reduction. Structure 2004;12:1729–1740. (i) Meijers R, Adolph H-
W, Dauter Z, Wilson KS, Lamzin VS, Cedergren-Zeppezauer ES. Structural evidence
for a ligand coordination switch in liver alcohol dehydrogenase. Biochemistry
2007;46:5446–5454. (j) Liao DI, Calabrese JC, Wawrzak Z, Viitanen P V, Jordan DB.
Crystal structure of 3,4-dihydroxy-2-butanone 4-phosphate synthase of riboflavin
biosynthesis. Structure 2001;9:11–18. (k) Rigsby RE, Rife CL, Fillgrove KL, Newcomer
ME, Armstrong RN. Phosphonoformate: a minimal transition state analogue inhibitor of
the fosfomycin resistance protein, FosA. Biochemistry 2004;43:13666–13673.
Page 25 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
26
Figure legends
Figure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by
different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The peptide
backbone structure only (residues are ignored). (c) Cartoon representation of the backbone structure.
Figure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated
amino acids are discussed below in Section 4.4.
Figure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a
different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol dehydrogenase, (e)
Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-
harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV
proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside phosphorylase, (p) Aspartate
transcarbamoylase, (q) Glutamine synthetase.
Figure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by
different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2)
heterodimer.
Figure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by
different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b)
Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The
analyzed structure, adding ghost atoms. See text for explanation of (c) and (d).
Figure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of β1 alcohol
dehydrogenase. (b) Symmetry-related domains of γβ-crystallin. Only coordinates of colored areas were considered
in calculations.
Figure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the
near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase,
(c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources see Ref. 37.
Figure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 β-hydroxysteroid
dehydrogenase Type 1 (11 β-HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-distorted amino acids in
each structure are colored red.
Figure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) – see Table III
for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD). (c)
Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF).
Page 26 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Table I
CSM values of the homomeric proteins structures in Fig. 3.
Name (a) PDB ID G-
symmetry
Backbone CSM
value
Full CSM value
a Porin* 2XE2 C3 0.001 0.006
b Decameric amyloid P 1LGN C5 0.002 0.006
c Tobacco mosaic virus
disc*
3KML C17 0.002 0.002
d Alcohol dehydrogenase 1HDX C2 0.003 0.01
e Hemoglobin 2HHB C2 0.004 0.01
f Neuraminidase* 1V0Z C4 0.004 0.01
g trp RNA-binding
attenuation protein*
1WAP C11 0.006 0.05
h GroEL* 2NWC C7 0.008 0.01
i Wheat germ agglutinin* 1WGT C2 0.01 0.05
j Triose phosphate
isomerase*
8TIM C2 0.02 0.05
k Light-harvesting comp.2 2FKW C9 0.02 0.03
l Beta-subunit of types 1&2
heat-labile enterotoxin*
1LTS C5 0.03 0.08
m HIV proteinase* 7HVP C2 0.04 0.1
n C-reactive protein* 1GNH C5 0.05 0.08
o Purine Nucleoside
Phosphorylase
1ODI C3
C2
D3
0.005
0.002
0.006
0.02
0.01
0.02
p Aspartate
transcarbamoylase
2RGW C3
C2
D3
0.004
0.001
0.005
0.01
0.006
0.01
q Glutamine synthetase 1F52 C6
C2
D6
0.0005
0.0001
0.0008
0.0006
0.0002
0.0009
(a) For data sources see Ref. 30. Starred proteins are referred to in the literature as symmetric; all others
are referred to as quasi-symmetric etc.
Page 27 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Table II
CSM values of the heteromeric proteins in Fig. 4.
Name (a) PDB ID G-
symmetry
Backbone CSM
value
a Disintegrin
heterodimer
1TEJ C2 0.03
b GCN4 Acid/Base
heterodimer
1KDD C2 0.06
c Phospholipase A2
(PLA2)
1Y75 C2 0.09
(a) For data sources see Ref. 31.
Page 28 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Table III
The hydrophilic/hydrophobic ratio of the 15-most symmetry distorted amino-acids in several proteins.
PDB ID(a) Hydrophilic:hydrophobic
ratio
1WKQ 14 : 1
1TZB 14 : 1
1NEY 13 : 2
1UWK 13 : 2
1X9I 13 : 2
2GUD 13 : 2
1MWQ 12 : 3
1VZI 12 : 3
2JHF 12 : 3
1G57 10 : 5
1NKI 10 : 5
(a) For data sources see Ref. 40.
Page 29 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The
peptide backbone structure only (residues are ignored). (c) Cartoon representation of the backbone
structure. 76x48mm (300 x 300 DPI)
Page 30 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated amino acids are discussed below in Section 4.4.
82x71mm (300 x 300 DPI)
Page 31 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol
dehydrogenase, (e) Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside
phosphorylase, (p) Aspartate transcarbamoylase, (q) Glutamine synthetase. 176x168mm (300 x 300 DPI)
Page 32 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2)
heterodimer. 81x40mm (300 x 300 DPI)
Page 33 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b) Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The
analyzed structure, adding ghost atoms. See text for explanation of (c) and (d). 173x101mm (300 x 300 DPI)
Page 34 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of β1 alcohol dehydrogenase. (b) Symmetry-related domains of γβ-crystallin. Only coordinates of colored areas
were considered in calculations. 83x63mm (300 x 300 DPI)
Page 35 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase, (c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources
see Ref. 37. 176x101mm (300 x 300 DPI)
Page 36 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
figure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 β-hydroxysteroid dehydrogenase Type 1 (11 β-HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-
distorted amino acids in each structure are colored red.
83x29mm (300 x 300 DPI)
Page 37 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics
Figure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) – see Table III for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD).
(c) Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF).
83x57mm (300 x 300 DPI)
Page 38 of 38
John Wiley & Sons, Inc.
PROTEINS: Structure, Function, and Bioinformatics