38
TITLE PAGE The near-symmetry of proteins Proteins symmetry Maayan Bonjack and David Avnir* Institute of Chemistry and The Lise Meitner Minerva Center for Computational Quantum Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel Key words: symmetry measure, CSM, rotational symmetry, symmetric proteins, symmetry maps, near-symmetry. *Correspondence to: David Avnir. Institute of Chemistry, The Hebrew University, The Edmond J. Safra Campus - Givat Ram, Jerusalem, IL 9190401. E-mail: [email protected] Research Article Proteins: Structure, Function and Bioinformatics DOI 10.1002/prot.24706 This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as an ‘Accepted Article’, doi: 10.1002/prot.24706 © 2014 Wiley Periodicals, Inc. Received: Aug 14, 2014; Revised: Oct 06, 2014; Accepted: Oct 18, 2014

The near-symmetry of proteins

  • Upload
    david

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The near-symmetry of proteins

TITLE PAGE

The near-symmetry of proteins

Proteins symmetry

Maayan Bonjack and David Avnir*

Institute of Chemistry and The Lise Meitner Minerva Center for Computational Quantum

Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel

Key words: symmetry measure, CSM, rotational symmetry, symmetric proteins, symmetry

maps, near-symmetry.

*Correspondence to: David Avnir.

Institute of Chemistry, The Hebrew University, The Edmond J. Safra Campus - Givat Ram,

Jerusalem, IL 9190401.

E-mail: [email protected]

Research Article Proteins: Structure, Function and BioinformaticsDOI 10.1002/prot.24706

This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as an‘Accepted Article’, doi: 10.1002/prot.24706© 2014 Wiley Periodicals, Inc.Received: Aug 14, 2014; Revised: Oct 06, 2014; Accepted: Oct 18, 2014

Page 2: The near-symmetry of proteins

2

ABSTRACT

The majority of protein oligomers form clusters which are nearly symmetric. Understanding of

that imperfection, its origins, and perhaps also its advantages requires the conversion of the

currently used vague qualitative descriptive language of the near-symmetry into an accurate

quantitative measure that will allow to answer questions such as: 'What is the degree of

symmetry deviation of the protein?', 'how do these deviations compare within a family of

proteins?', and so on. We developed quantitative methods to answer this type of questions,

which are capable of analyzing the whole protein, its backbone or selected portions of it, down

to comparison of symmetry-related specific amino-acids, and which are capable of visualizing

the various levels of symmetry deviations in the form of symmetry maps. We have applied

these methods on an extensive list of homomers and heteromers and found that apparently all

proteins never reach perfect symmetry. Strikingly, even homomeric protein clusters are never

ideally symmetric. We also found that the main burden of symmetry distortion is on the amino-

acids near the symmetry axis; that it is mainly the more hydrophilic amino-acids that take place

in symmetry-distortive interactions; and more. The remarkable ability of heteromers to

preserve near-symmetry, despite the different sequences, was also shown and analyzed. The

comprehensive literature on the suggested advantages symmetric oligomerizations raises a yet-

unsolved key question: If symmetry is so advantageous, why do proteins stop shy of perfect

symmetry? Some tentative answers to be tested in further studies are suggested in a concluding

outlook.

Page 2 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 3: The near-symmetry of proteins

3

INTRODUCTION

The majority of oligomeric proteins appear as clusters of sub-units which are symmetric

at least to some degree1–7. The symmetry point-groups of these clusters are chiral (because the

building blocks are chiral8), and the most prevalent ones are the cyclic Cn and the dihedral Dn

symmetries1,5. Proteins symmetry research continuous to be a central theme in structural

biochemistry, and some recent examples are collected in Ref. 9. Understanding the abundance

of this structural feature – symmetry – has received much attention, because at first glance it is

not clear why this feature is important for proteins function, that is, why has symmetric

clustering evolved at all. It has been proposed1–5 that symmetric oligomerization contributes to

an increase in protein stability because interactions between subunits are maximized; that it

contributes to an unwanted excessive aggregation; that it enhances the coding efficiency,

because less information is needed when the subunits are identical; that it reduces synthetic

errors, because as the size of the protein unit grows, the more prone it is to such errors; and that

it contributes to more efficient cooperative regulation, because protein with several identical

subunits may display allosteric effects.

Interestingly, despite of these key advantages, the vast majority of protein clusters fall

short of perfect symmetry1–4,7,10. This appears not only in clusters where the subunits are not

identical (hetero-oligomers), but even in homo-oligomers. The settling for imperfect symmetry

in clusters built by even the same units is a key phenomenon which, to the best of our

knowledge, still awaits detailed quantification, exploration of the structural origins of this

imperfection, and exploration of possible advantageous of not reaching the ideal symmetry.

We believe that understanding of these issues can gain from replacing the current qualitative

language that has been used – “quasi-symmetry”, “pseudo-symmetry”, “approximate

symmetry”, “near-symmetry”, etc.1–4,7,10 – by quantitative descriptors of symmetry that will

allow new-type of questions such as: how much of a given symmetry is there in a protein

Page 3 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 4: The near-symmetry of proteins

4

cluster?; by how much is one cluster more symmetric than the other?; what is the distribution

of symmetry values within a family of related proteins?; how did the symmetry content change

with evolution?; which portions of a protein carry most of the burden of the symmetry

deviation?; what is the distribution of symmetry distortion within the protein?; and so on. The

use of quantitative descriptors of symmetry – the Continuous Symmetry Measure (CSM)) – has

already proven very useful in identifying and understanding a host of symmetry-related

phenomena11–14, and some encouraging indications for its usefulness in protein structure

analyses exist. For instance, Keinan et al15 have shown a correlation between the inhibition

activity of HIV protease and the changes that the near C2-symmetry of the inhibitors undergoes

upon complexation at the active site. Also of relevance for this introduction is the modeling

work of Andre et al which quantified the symmetry of various modeled assemblies of proteins2.

The CSM quantitative symmetry analysis of proteins suffered for two decades from the

magnitude of the problem: The large number of atoms with all of their potential permutations

(a key step in the calculations) has drastically limited the ability to apply CSM analyses to

proteins. The computational problem was recently solved11, decreasing the dependence of the

computation time on the number of atoms, N, from N! to N2; this has opened the gates for a

detailed symmetry analysis of proteins. Here we describe the development of different

variations on the basic tool to quantify and analyze the symmetry content of protein oligomers,

tailored to the various analysis needs: Symmetry analyses of the whole protein or of selected

fractions of it, using either all atoms or only the backbone atoms, and the formation of

symmetry maps that reveal at a glance the deviation degree of various regions from perfect

symmetry. Using these methods we show - on a quantitative level - that very often the burden

of deviation from symmetry lies near the symmetry axis and in the border regions of the

clustered oligomers, and propose why this should be so; we find that the hydrophilic amino-

acids are more likely to carry conformational symmetry distortion; and more.

Page 4 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 5: The near-symmetry of proteins

5

METHODS

The computational tools

The Continuous Symmetry Measure (CSM) 16,17 is a method for quantifying the degree

of symmetry of a given object. According to the CSM approach, the G-symmetry point group

content of an object is the minimal distance between two objects: an original structure, Q�, and a

G-symmetric structure, �����, which consists of the same atoms and connectivity and is the

closest to the original distorted structure. This minimal distance of the object's vertices from

the desired G-symmetry is defined by

(1) �� = ���� ∑ ���� − �����������

where ��� are the coordinates of the ith atom of the original studied molecule, ������ are the

coordinates of the ith atom of the nearest structure which has the desired symmetry, the

denominator is the root mean square size normalization factor of the original centered structure

(� = �∑ |���|���� ), and N is the number of analyzed atoms in the structure (see Supporting

Information, Fig. S1, for additional details). It should be emphasized that this measure is

inherently different than the rmsd analyses of the degree of similarity – these tools do not

evaluate the symmetry itself as a structural parameter, which is the key issue of this report. The

range of the symmetry scale is 0 ≤ �� ≤ 1 and is expanded by a factor of 100 for

convenience (0 ≤ �� ≤ 100). If a structure is of perfect G-symmetry, then S(G)=0 and as

the structure distorts from the perfect symmetry, S(G) increases. The maximal value of 100 is

obtained when the nearest structure with the nearest G-symmetry reduces to a single point (for

instance, when one wishes to find the degree of inversion of HCl). When G is the symmetry

point group Cn, the maximal possible value is 50 because the nearest structure collapses to a

line in space (for instance, when one wishes to find the degree of pentagonality of a tetragon).

S(G) is a global parameter, and therefore allows the comparison of various structures and

Page 5 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 6: The near-symmetry of proteins

6

various symmetries on the same scale. Eq. (1) is a special distance function in that the target

structure is not known but is searched. This involves several minimizations which are carried

out both analytically and numerically (see Ref.'s 16,18,19 for details). As mentioned above, the

most time-consuming minimization is over the possible permutations between the atoms of the

studied structure and the atoms of the target symmetric structure (which is a mathematical

construct, not necessarily a possible molecule) in search for the optimal permutation, namely

the one that minimizes S(G). Ref. 11 describes in detail how this problem is solved, resulting

calculation times which scale reasonably with N2. Typically, computing the S(G) value of a

protein of 70 KDalton takes less than five minutes on an Intel Core i5-3470 @ 3.20 GHz

processor. The majority of our analyses concentrate on the most abundant oligomeric near-

symmetry, namely of Cn. Few examples use the less abundant point-group symmetry Dn, using

a recently developed computational tool developed for that symmetry20.

The analyzed proteins data

The selection of proteins for analysis with near Cn- or Dn-symmetries was based on

direct literature search. The selection criterion of the proteins was the explicit qualitative

description of symmetry or of near-symmetry, approximate symmetry, etc. in

manuscripts1,3,4,10,21-24 and in the proteins' original crystallographic reports. The homodimers

which are mentioned in the last section of the "results and discussion" were selected from the

database "3D Complex: a structural classification of protein complexes"25, and here the

selection criterion was the mentioning of C2 symmetry. The coordinates of the analyzed

proteins (Q� in eq. (1)) were taken from the crystallographic Protein Data Bank (PDB)26. We did

not use any data which was derived by assuming complete symmetry (these structures are by

definition of S(G)=0 value); the only crystallographic asymmetric units taken contain the full

oligomer. Water molecules were removed, as well as non-protein components such as

Page 6 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 7: The near-symmetry of proteins

7

phosphate, magnesium or glycerol. Cofactors were included if their number reflected the

protein near-cyclic point group. The number of atoms in each of the subunits might not be

equivalent, either because the protein is a hetero-oligomer, or because the crystallographic data

is missing information. Computational methods of addressing these situations are described

below. Regarding the quality of the analyzed data, we followed the recommendations of

Wlodawer et al27 in selecting most of the structures analyzed below; that is, we chose structures

with a resolution of less than 3.0Å and with an R factor of less than 0.25. In few cases we used

for explanatory purposes structures of lower resolution if the specific structure was mentioned

as symmetrical in the original report. We estimated the error of the CSM values by using the

crystallographic B factor from the given crystallographic data of the structure. We then

translated B factor into the U factor and checked how does U, as well as the number of

collected reflections, affect the confidence in the coordinates of the structure. The rms of the

errors in the coordinates was then translated into a statistical estimation of the symmetry

measure and revealed the error associated with the determining degree of symmetry from X-

ray (see Ref. 28 for full equations and more details). Since in general, these estimations

revealed an insignificant error, CSM values are introduced here without it.

RESULTS AND DISCUSSION

Developing various options for the near-symmetry analysis of proteins

The symmetry analysis of proteins, homo- as hetero-oligomers, can be carried out at

various degrees of resolutions, depending on the specific investigated questions at hand: It can

be carried out with either the full set of atoms or a sub-set of it, or with fragments of the

protein. Consequently, our first task has been to develop computational options for four main

categories as follows:

Page 7 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 8: The near-symmetry of proteins

8

a. Backbone symmetry analysis of the whole protein oligomer: This analysis takes into

account only the coordinates of atoms within the polypeptide backbone, without the residues

(Fig. 1b,c). The method enables the user to have a general sense of the symmetry of the protein

- it is analogous to the various basic cartoon presentations of proteins. This method is

particularly relevant for heteromeric proteins, because of the remarkable property that

differences in sequences still preserve the main structural features29, including the symmetry

deviation level. In fact, the symmetry in such heteromeric-proteins may reflect the existence of

a homomeric-symmetric ancestor oligomer which later-on specialized by becoming

heteromeric21. By focusing on the polypeptide backbone one can perform comparative analyses

all across the border and include both homo- and hetero- oligomers. Because of its generality

we recommend this computational tool at least as a first-approach step.

b. All-atoms symmetry analysis of whole protein oligomers: A more detailed symmetry-

analysis includes the whole structure of the protein, that is, the backbone and residues (usually

without the hydrogens, Fig. 1a). This higher-resolution full analysis may be applied only on

homomeric proteins, because in this case the identity of the amino acids is an essential part of

the analyzed information (applying this analysis on heteromeric proteins, where the units are

not identical, would lead to an unreasonable permutations and thus to extremely high CSM

values).

c. Symmetry analysis of fragments: It is quite often in proteins structural analyses to focus on

selected fragments within the full set of data. Furthermore, it is obvious that CSM values of the

whole protein hide more specific information on the origin of the symmetry deviation of the

whole. For instance, it might be illuminating to know which regions in the protein carry the

burden of the deviation, and which are barely affected by it. This analysis can be done with the

two options described above: The fragment can be of a backbone, or of a full set of atoms. For

Page 8 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 9: The near-symmetry of proteins

9

instance, one could compare the backbone symmetry of fragments of homomeric oligomer

with fragments of analogous heteromeric oligomer.

This option of local symmetry analysis opens also the possibility to analyze quasi-symmetry

related portions within monomeric proteins or within a single subunit of an oligomer. In fact,

these fragments can be as small as one may wish to have, down to symmetry-matched amino

acids, specific carbonyl groups, and even as small as symmetry-equivalent atoms. For instance,

in a near-C3 trimeric-oligomer one could choose a specific carbonyl-bearing carbon and its two

closest equivalent carbons obtained by that operation, and compute S(C3) for these three atoms.

d. Symmetry maps: Continuing with the last example, one can in fact repeat this procedure for

another set of three atoms, and so on for a set of atoms as large as one wish to have. For all

practical purposes, it is not needed to go to resolutions as high as single atoms - we found that

the relevant minimal, useful unit is, as expected, the amino-acid. The collection of all of the

local CSM values, displayed with a color-code in space, provides then a symmetry-map of the

protein or of portions of it – see Fig. 2. This analysis provides a telltale visualization, which

gives at a glance the relative deviations from symmetry within the oligomer structure.

The construction of symmetry maps is carried out as follows: First, the protein is

divided into sets of symmetry-related amino-acids; each set consists of one amino-acid from

each subunit. When the division is completed, the number of the sets is identical to the number

of amino-acids in the subunit. Second, a CSM calculation is carried out for each set. A

graphical decision is then made as to the color code of amino acids are considered as having a

very-high, high, medium, low and very-low CSM values.

Backbone symmetry analysis of the whole protein oligomer

a. Homomers: We begin by applying the backbone symmetry analysis of the whole protein on

homomeric proteins with near Cn or Dn symmetry. The conclusion to be reached in this section

Page 9 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 10: The near-symmetry of proteins

10

is not trivial and quite surprising: All of the homo-oligomeric proteins we analyzed, even those

which are declared "symmetric" have S(Cn)≠0 values (beyond the error range – see Section

"the analyzed proteins data"); that is, despite the identical composition of the units, the proteins

"select" to stop short of perfect symmetry. The symmetry deviations are small, but enough to

let the literature use qualitative descriptors such as "pseudo" and "approximate". Consider for

instance the structure of human hemoglobin, which is composed of four subunits, arranged as a

dimer of α and β subunits (namely α2β2), which is commonly described as possessing "pseudo-

C2 symmetry"30 (Fig. 3e). The analyzed protein backbone in its T (deoxy) quaternary structure

has a CSM value of S(C2)=0.004. This is a small symmetry distortion (in the following sections

we encounter much larger values) but definitely justifies the qualitative "pseudo" descriptor

used in the literature. Other homomeric proteins were analyzed similarly, and the results are

collected in Table I and in Fig. 3 (see Supporting Information, Table S1 for additional analyzed

proteins). It is seen that none of the proteins is perfectly symmetric. The various S(Cn) values

point to an optimal symmetry deviation at which each protein settles, and it is seen that these

optima are not the same. We recall that the CSM scale is unified and comparable for all

symmetries. That is, any two proteins, regardless of their symmetry point group, can be

compared according to their CSM values. For instance, according to Table I, hemoglobin

(S(C2)=0.004) is more symmetric than purine nucleoside phosphorylase (S(C3)=0.005).

We found the same universality for near Dn-symmetric proteins (Table I and Fig. 3). In

fact, Dn analysis allows one also to compare its two chiral rotational sub-groups, Cn and the

perpendicular C2. For each of those proteins we present the CSM values for Dn, Cn and C2

(Table I and Fig. 3). It is seen that the approximate symmetry shows up also in the two sub-

group symmetries. The relative magnitude of the CSM values is in agreement with the

expected relation S(Dn)≥S(Cn)≥S(C2). This is so, because it follows the order of the number of

Page 10 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 11: The near-symmetry of proteins

11

elements in each of these symmetry groups: The more elements, the more distant is the nearest

fully symmetric structure.

b. Heteromers: We move on with the backbone analysis to heteromeric proteins. Here there are

basically two general cases:

1. Same length – different sequence: All of the heteromeric units are of the same backbone

length and differ only in their sequences. The analysis of this case is carried out exactly as was

done with the backbone of the homomeric proteins, and examples are provided in Fig. 4 and

Table II. For instance, the heterodimer of disintegrin (a protein from the venom of Echis

carinatus that binds to integrins receptors) is composed of two monomers which have the same

sequence length, and the sequence identity is 84% (Fig. 4a)31. The crystal structure is described

as having a high level of “pseudosymmetry”, which translates quantitatively to a CSM value of

S(C2)=0.03 for the backbone.

2. Different length units: The second case is of proteins in which some or all subunits have

different lengths. Analysis of this case requires additional data processing. Consider, for

instance, the protein 20S proteasome from the archaebacterium Thermoplasma acidophilum

(which is part of a proteasome complex that fills an important role in the degradation of

unneeded or damaged proteins)32. It is composed of four rings, one on top of the other (Fig.

5a), each of which is built from seven subunits (Fig. 5b). As seen in Fig. 5b, the near C7

symmetry is quite clear, and this is so although each subunit has a unique amino acid sequence

and length (the length of the seven subunits is of 243, 250, 244, 241, 242, 233 and 244 amino-

acids). The question then is how to handle that situation which is quite common with

heteromeric proteins? If no correction is made for the different lengths the CSM value of the

structure is extremely high, S(C7)=49 (because of lack of proper permutations between the

atoms of the oligomer units). That value basically means that there is no C7-symmetry in the

Page 11 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 12: The near-symmetry of proteins

12

protein 20S proteasome whatsoever, which defies what the eye sees. What is it then that the

eye sees? That the bulk of the backbone atoms are indeed nearly C7-symmetric. Translating it

into practice requires either ignoring the extra atoms with no equivalent symmetry-related

atoms (the CSM is then S(C7)=0.2, Fig. 5c); or - following Frid et al33 idea in analyzing the

symmetry of branched structures - adding ghost atoms at the missing parts of the shorter

subunits (at the C-terminus of the amino acid); in this case S(C7)=0.5 (Fig. 5d). Each of these

two options provides proper permutations and CSM values which translate what the eye sees

properly. While the first method ignores atoms, and thus reflects the symmetric core of the

oligomer, the second method takes the intrinsic asymmetry of the whole protein structure into

consideration; we thus recommend using the second method.

All-atoms symmetry analyses of whole homomers

Contrary to the backbone analysis, all-atoms analyses take into account many more

conformational distortions, due to inclusion of the amino acid residues. This residue

conformational freedom is expected to increase the CSM values compared to the corresponding

backbone CSM values. Indeed, returning to hemoglobin for which we obtained S(C2)=0.004

for the backbone, the symmetry measure increases 2.5-fold to S(C2)=0.01 (Table I) when the

full protein is analyzed. As seen in Table I, this trend is true for all of the analyzed proteins.

One should note that the larger CSM values are not the result of the larger number of atoms

considered, due to the normalizations according to the number of atoms and size. However, the

magnitude of the difference in the CSM values between the full- and backbone- analyses show

that the residues contribute to overall distortion are much more than the backbone.

Symmetry analyses of fragments

Page 12 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 13: The near-symmetry of proteins

13

Quite often, one may wish to concentrate on specific portions of the protein (portions

which are symmetry-related in each of the homo- or hetero-oligomer units, or even in a

monomeric protein structure), rather than on the whole. Such local focus - for instance, α-

helices, border zones between the oligomer units or specific domains in the structure - provides

higher resolution information compared to the previous sections. That is, as they are not

masked by portions which are of less relevance to the question at hand, or which carry only

little of the symmetry distortion. Consider, for example, the homodimer of human β1 alcohol

dehydrogenase (a protein that participates in the metabolism of ethanol and other substrates,

Fig. 6a)30. The protein consists of α-helixes and β-sheets, and the question we ask is, where are

the higher symmetry distortions? Calculations of S(C2) of the β-sheets and of the α-helixes

separately provide values of S(C2)=0.002 and S(C2)=0.003, respectively. That is, the α-helixes

tend to be more distorted than β-sheets. Yet, since the global CSM value is much higher

(S(C2)=0.01), the burden of the protein distortion must be elsewhere, in less rigid zones.

Indeed, analysis of connecting segments of the rigid secondary structures was calculated and

found to be relatively high- S(C2)=0.02. This is general: We observed this trend in many other

proteins as well: rigid segments have lower CSM values, compared to flexible ones.

As mentioned above, local symmetry analysis opens also the possibility to analyze

quasi-symmetry related portions within monomeric protein or within a subunit of an oligomer.

An example for this case is the protein γβ-crystallin (an eye lens protein), which is referred to

as a highly symmetrical structure34, despite the different sequences of the components. It

consists of two domains; each of them contains two structures of the 'Greek key' motifs (Fig.

6b). The two domains are related by an approximate 2-fold symmetry, as described in

literature35. However, the domains differ in their sequence and length. Therefore, addition of

"ghost" atoms is required, as described in Section "backbone symmetry analysis of the whole

protein oligomer". Calculation of the CSM value was carried out only on the relevant motifs,

Page 13 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 14: The near-symmetry of proteins

14

and the linking segments which add a lot to the symmetry deviation, as we saw above, were

ignored. The resulting CSM value is S(C2)=0.20, and it indicates that the two domains are

largely related by C2 symmetry, although the sequence and length is different.

Symmetry maps

a. Symmetry maps of oligomers: The symmetry map is a high resolution local analysis which

determines the symmetry content of symmetry-matched amino acids within the oligomer. The

analysis produces symmetry maps of the kind shown in Fig. 2 for the near C2-symmetric dimer

of HIV-1 protease30 (Fig. 3m, an aspartic protease; we recall that the near C2-symmetry of the

dimer gave rise to the development of C2-related anti-HIV drugs36). The CSM value of the

protein in the full analysis was found to be (above) S(C2)=0.13; let us now see how this

translates to the symmetry map of Fig. 2. Literature points specifically at two amino-acids in

position 50 and 51 (Ile50 and Gly51 in each subunit) that are a major source of symmetry

distortion conformation. Indeed, the CSM value of Ile50 is extremely high, S(C2)=4.64. The

CSM value of Gly51 is lower, S(C2)=0.21, but nevertheless it is still higher than the CSM value

of the full protein. Applying a four colors-code map not only confirms that those amino-acids

break the symmetry of the protein, but also reveals other amino acids which cause symmetry

deviations. The most distorted amino acids (in this map, 1<CSM<5) are colored red and

include Ile50, Thr96 and Asn98. The less distorted amino acids (in this map, 0.5<CSM<1) are

colored orange and include Gly49, Ile66, Ile84, Leu97. Yellow is the next color code (in this

map, 0.2<CSM<0.5) and includes Gln2, Ile15, Leu23, Thr26, Glu35, Lys41, Lys45, Gly51,

Ala95; and the green-colored amino acids (in this map, 0.1<CSM<0.2) are Pro1, Gln7, Arg14,

Glu21, Asp25, Glu34, Ile47, Lys55, His69, Lys70, Ile72 and Lys89.

Significantly, the symmetry map (Fig. 2) indicates that most of the symmetry distorted

symmetry-matched pairs of amino-acids are located in the area where the two units coalesce

Page 14 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 15: The near-symmetry of proteins

15

into one, and along the near Cn symmetry axis (in dimers, those two elements are of the same

region in the protein). This again is general: we shall see it repeatedly in more case analyses

(Fig. 7). The area where the components of the oligomer interact is indeed expected to be

active in symmetry distortion. Not only that when two similar sub-units interact there is no

reason to expect a similar distortion upon interaction, the contrary is true - conformational

changes that lead to the optimal interaction often would require the two components to adjust

to each other, which would mean that each subunit will acquire a different conformer. For

example, the loop in position 49-51 (bottom of Fig. 2) is very distorted (colored red, orange

and yellow), and this is due to the non-equivalent conformers of Ile50 and Gly51 that are

needed in order to form a direct hydrogen bond between them (dashed line, Fig. 2). The

mechanical approach to proteins structure would suggest that the distortions at the units

interacting zone would affect also more remote zones of the protein. This is also seen in Fig. 2,

as regions that are distant from the interacting zone are distorted as well, although to a lesser

degree (colored green).

Additional symmetry-maps are shown in Fig. 737. In most symmetry maps not only is

there a quantitative and visual reporting of original qualitative statements in literature about

symmetry-related distorted amino acids, but also indications for other distorted regions, that

were not mentioned in the original reports, as seen above in HIV-1 protease.

The various symmetry maps show some main trends: First, the most distorted amino-

acids, which are indicated by the red- and orange-colored areas, are located near the rotation

axis and between the touching zones of the subunits, as was mentioned above. Second,

symmetry related amino acids, located in any type of secondary structures - rigid as well as

flexible (i.e. α-helixes or loops), can diverge from perfect symmetry relation (this is in

agreement with the finding mentioned above that rigid regions, as a whole, tend to be more

symmetric than flexible ones). Third, the less distorted green-colored regions may be present

Page 15 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 16: The near-symmetry of proteins

16

all over the protein structure, including the protein-solvent interface. Fourth, it was found that

there is no decaying gradient in symmetry measure from the most symmetry distorted regions

into their near surroundings. However, as was mentioned above, the distortions do have the

potential to affect distant regions.

b. Comparative symmetry maps of different proteins: Another example of the usefulness of

symmetry maps analysis of proteins is the comparison of the symmetry of different proteins

which have only partial structure and sequence similarity. Our example is the two

homodimeric-proteins, 11 β-hydroxysteroid dehydrogenase Type 1 (11 β-HSD1) of a guinea

pig and of a murine38, which have a sequence homology of 70% and possess a similar

structure; both are referred to in the literature as having a pseudo-2-fold axis (Fig. 8). This

"pseudo-symmetry" translates into global CSM value of S(C2)=0.019 and S(C2)=0.024 for the

murine and the guinea pig, respectively, which in fact is a very interesting result in itself:

Despite the fact that the sequential similarity is only 70%, the two proteins are with very

similar CSM values, which might show their structural and functional similarity. This agrees

with the identified evolutionary trend29 by which keeping the trait of a specific structure along

evolution is more important than keeping the sequence, signaling that activity is primarily

dependent on the quaternary structure, and indicating the ability to use different building-

blocks (the amino-acids) to form similar structures. Let us now look at these proteins at a

higher resolution (Fig. 8), to find local regions which are of much higher S(C2) values (up to

2.8). An interesting observation made possible by this type of analysis is that the two proteins

share the same amino-acids with high CSM values, each in its C2-symmetry-related subunits in

the protein. Out of the 20 most distorted amino-acids in each protein, 7 amino-acids are shared

(Glu26, His134, Arg137, Ile203, Glu239, Asn270, Lys274). Note that in this case too,

significant parts of the distortion are located near the C2-symmetry-axis.

Page 16 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 17: The near-symmetry of proteins

17

Which amino-acids carry the burden of symmetry distortion?

We use now the symmetry maps to answer this key question, and for that purpose we

have concentrated on homodimeric proteins, (which, as mentioned above, are classified as

having C2-symmetry in the database of Ref. 25). We have applied the symmetry analysis of

fragments on each of these proteins, and chose the minimal fragments in the analysis as the

individual amino-acids. That is, each CSM calculation was consisted of a pair of two amino-

acids, one in each of the two subunits, which are symmetry-related (as in the symmetry maps).

We then searched for the most distortive ones, and focused on the 15 most symmetry-distorted

C2-symmetry-related pairs of amino acids, in search of what might characterize the symmetry-

distorted pairs. After searching various possible parameters (such as polarity, acidity, etc.) a

very clear link was found with the hydrophilicity/hydrophobicity balance of the amino-acids

(as defined in Ref. 39): It was found that the majority of distorted 15 residues are hydrophilic,

as can be seen in Table III and in Fig. 940. This finding means, perhaps as expected, that a main

source of structure re-adjustment is due to multiple polar asymmetric interactions of hydrogen-

bonds between amino-acids or between an amino-acid and water molecules. These interactions

enable the forcing of conformation changes of the amino-acids in order to get to a preferred

position, as was already discussed above regarding the HIV protease. On the other hand,

hydrophobic residues are engaged in weaker dispersive interactions, which are less demanding

and therefore less distortive. This observation leads us to our final topic of conceptual

discussion:

CONCLUSIONS

The shift from qualitative descriptions of symmetry deviations of proteins into

quantitative descriptions, as proposed in this report, proved to be an eye opener: Not only does

it translate the descriptive language to a meaningful scale, it allows one to obtain a clear picture

of the molecular origin of the asymmetry. Thus, we found that segments located near the

Page 17 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 18: The near-symmetry of proteins

18

symmetry-axis and in between the subunits consistently carry part of the burden of symmetry

deviations; and we found that hydrophilic amino acids tend to carry the burden of asymmetry

by adopting different conformers relative to their symmetry-related amino acids in the protein

structure. The strength, as well as the weakness, of the method is that it is a thermodynamic

measure: One cannot go back from it to specific structural features, but it does combine such

various features into one value.

Such observations bring us back to the question at the opening of the report: If

symmetry is so advantageous1–5, why do proteins stop shy of it? We propose the following list

of reasons for this phenomenon which is quite universal in the world of proteins. All of the

following must be tested in future studies and should be regarded as a starting point for

discussion: The first important parameter is the free energy, that is, the enthalpic and entropic

parameters. Enthalpy changes must follow the interactions and conformational readjustments

optimizing the interfacial contact area of the subunits: “insisting” on perfect symmetry would

not minimize the protein's enthalpy, and thus would not minimize its free energy. We also

propose that entropy has a great influence on the symmetry distortion of the protein: A fully-

symmetric structure of oligomer has a certain number of possible microscopic conformations

of the protein. However, as the protein deviates from its perfect symmetric structure, that

number increases sharply. That is, the entropy content rises, probably quite steeply, as the

protein settles for symmetry that is less than perfect. Second, we consider dynamics as another

cause for asymmetry, since proteins are not static structures, but are in a constant motion and

thus have ensembles of conformations. As dynamics moves any symmetric object away from

symmetry (for most of its conformers), the protein necessarily distorts from perfect symmetry.

Using the CSM tool, a distortion is always a non-negative value, S(G)≥0, namely the distance

from perfect symmetry. It follows that the average CSM value of the protein's conformers is

always S(G)≥0 as well, because an average of positive values is also positive. It is a delicate

Page 18 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 19: The near-symmetry of proteins

19

point that while the geometric average of distorted structures can be perfectly symmetric, the

average symmetry measure is still non-zero. The explanation: The average geometry is the

most symmetric structure because the directionalities of fluctuations tend to compensate each

other (they are vectorial entities and can have negative values); symmetry deviation values on

the other hand are always positive numbers. Last but not least, one must take into account the

surroundings: proteins are always in an asymmetric environment – certainly in the cell, but

even in the crystalline state. These asymmetric surroundings do not induce asymmetry to the

protein structure, but stabilize it: the hydration shell stabilizes the structure by forming mainly

hydrogen bond interactions. As was discussed before, hydrogen interactions encourage

asymmetry.

We hope this report will stimulate further exploration of the effects of these parameters

and of their relative weight in determining proteins symmetry and their symmetry-related

functionality; research in this direction is in progress.

ACKNOWLEDGEMENTS

We thank Dr. Chaim Dryzun for useful advice and Amir Zait for development and assistance

of the computational tools.

Page 19 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 20: The near-symmetry of proteins

20

REFERENCES

1. Goodsell DS, Olson AJ. Structural symmetry and protein function. Annu Rev Biophys

Biomol Struct 2000;29:105–153.

2. André I, Strauss CEM, Kaplan DB, Bradley P, Baker D. Emergence of symmetry in

homooligomeric biological assemblies. Proc Natl Acad Sci USA 2008;105:16148–

16152.

3. Blundell TL, Srinivasan N. Symmetry, stability, and dynamics of multidomain and

multicomponent protein systems. Proc Natl Acad Sci USA 1996;93:14243–14248.

4. Kojić-Prodić B, Štefanić Z. Symmetry versus Asymmetry in the Molecules of Life:

Homomeric Protein Assemblies. Symmetry 2010;2:884–906.

5. Berchanski A, Segal D, Eisenstein M. Modeling oligomers with Cn or Dn symmetry:

application to CAPRI target 10. Proteins 2005;60:202–206.

6. Taylor WR, May ACW, Brown NP, Aszódi A. Protein structure: geometry, topology

and classification. Rep Prog Phys 2001;64:517–590.

7. Taylor WR, Aszodi A. Protein geometry, classification, topology and symmetry: A

computational analysis of structure (Series in Biophysics). Taylor & Francis; 2004.

8. Alvarez S, Serratosa F. Symmetry guidelines for the design of convergent syntheses. On

narcissistic coupling and la coupe du roi. J Am Chem Soc 1992;114:2623–2630.

9. (a) Abad-Zapatero C. Notes of a protein crystallographer: the beauty of rose windows

and the different meanings of symmetry. Acta Crystallogr D Biol Crystallogr

2014;70:907–911. (b) Schulze B, Sljoka A, Whiteley W. How does symmetry impact

the flexibility of proteins? Philos Trans A Math Phys Eng Sci 2014;372:20120041. (c)

Reisinger B, Kuzmanovic N, Löffler P, Merkl R, König B, Sterner R. Exploiting protein

symmetry to design light-controllable enzyme inhibitors. Angew Chem Int Ed Engl

2014;53:595–598. (d) Williams LS, Levdikov VM, Minakhin L, Severinov K, Antson

AA. 12-Fold symmetry of the putative portal protein from the Thermus thermophilus

bacteriophage G20C determined by X-ray analysis. Acta Crystallogr Sect F Struct Biol

Cryst Commun 2013;69:1239–1241. (e) Hilbert M, Erat MC, Hachet V, Guichard P,

Blank ID, Flückiger I, Slater L, Lowe ED, Hatzopoulos GN, Steinmetz MO, Gönczy P,

Vakonakis I. Caenorhabditis elegans centriolar protein SAS-6 forms a spiral that is

consistent with imparting a ninefold symmetry. Proc Natl Acad Sci U S A

2013;110:11373–11378. (f) Matsunaga Y, Koike R, Ota M, Tame JRH, Kidera A.

Influence of structural symmetry on protein dynamics. PLoS One 2012;7:e50011. (g)

Page 20 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 21: The near-symmetry of proteins

21

Bayfield OW, Chen C-S, Patterson AR, Luan W, Smits C, Gollnick P, Antson AA. Trp

RNA-binding attenuation protein: modifying symmetry and stability of a circular

oligomer. PLoS One 2012;7:e44309. (h) Gross M. Symmetry and complexity in protein

oligomers. Curr Biol 2012;22:R175–R177. (i) Nava J, Kreinovich V. Towards

symmetry-based explanation of (approximate) shapes of alpha-helices and beta-sheets

(and beta-barrels) in protein structure. Symmetry (Basel) 2012;4:15–25. (j) Broom A,

Doxey AC, Lobsanov YD, Berthin LG, Rose DR, Howell PL, McConkey BJ, Meiering

EM. Modular evolution and the origins of symmetry: reconstruction of a three-fold

symmetric globular protein. Structure 2012;20:161–171.

10. Levy Y, Cho SS, Shen T, Onuchic JN, Wolynes PG. Symmetry and frustration in

protein energy landscapes: a near degeneracy resolves the Rop dimer-folding mystery.

Proc Natl Acad Sci U S A 2005;102:2373–2378.

11. Dryzun C, Zait A, Avnir D. Quantitative symmetry and chirality-a fast computational

algorithm for large structures: proteins, macromolecules, nanotubes, and unit cells. J

Comput Chem 2011;32:2526–2538.

12. Tuvi-Arad I, Blonder R. Continuous symmetry and chemistry teachers: learning

advanced chemistry content through novel visualization tools. Chem Educ Res Pract

2010;11:48-58.

13. Dryzun C. Continuous symmetry measures for complex symmetry group. J Comput

Chem 2014;35:748–755.

14. Tuvi-Arad I, Rozgonyi T, Stirling A. Effect of temperature and substitution on Cope

rearrangement: a symmetry perspective. J Phys Chem A 2013;117:12726–12733.

15. Keinan S, Avnir D. Quantitative symmetry in structure−activity correlations: The near

C2 symmetry of inhibitor/HIV protease complexes. J Am Chem Soc 2000;122:4378–

4384.

16. Zabrodsky H, Peleg S, Avnir D. Continuous symmetry measures. J Am Chem Soc

1992;114:7843–7851.

17. Dryzun C, Avnir D. Generalization of the continuous symmetry measure: the symmetry

of vectors, matrices, operators and functions. Phys Chem Chem Phys 2009;11:9653–

9666.

18. Salomon Y, Avnir D. Continuous symmetry measures: A note in proof of the

folding/unfolding method. J Math Chem 1999;25:295–308.

19. Pinsky M, Avnir D. Continuous Symmetry Measures. 5. The Classical Polyhedra. Inorg

Chem 1998;37:5575–5582.

Page 21 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 22: The near-symmetry of proteins

22

20. Pinsky M, Zait A, Bonjack M, Avnir D. Continuous symmetry analyses: Cnv and Dn

measures of molecules, complexes, and proteins. J Comput Chem 2013;34:2–9.

21. Blundell TL, Bolanos-Garcia V, Chirgadze DY, Harmer NJ, Lo T, Pellegrini L, Sibanda

BL. Asymmetry in the multiprotein systems of molecular biology. Struct Chem

2002;13:405–412.

22. Guerler A, Wang C, Knapp EW. Symmetric structures in the universe of protein folds. J

Chem Inf Model. 2009;49:2147-2151.

23. Duran AM, Meiler J. Inverted topologies in membrane proteins: a mini-review. Comput

Struct Biotechnol J 2013;8:e201308004.

24. Popov P, Ritchie DW, Grudinin S. DockTrina: docking triangular protein trimers.

Proteins 2014;82:34-44.

25. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D complex: a structural

classification of protein complexes. PLoS Comput Biol 2006;2:e155.

26. http://www.rcsb.org/pdb/.

27. Wlodawer A, Minor W, Dauter Z, Jaskolski M. Protein crystallography for non-

crystallographers, or how to get the best (but not more) from published macromolecular

structures. FEBS J 2008;275:1–21.

28. Pinsky M, Yogev-Einot D, Avnir D. Statistical analysis of the estimation of distance

measures. J Comput Chem 2003;24:786–796.

29. Guerler A, Knapp E-W. GIS: a comprehensive source for protein structure similarities.

Nucleic Acids Res 2010;38:W46–52.

30. (a) Lou H, Chen M, Black SS, Bushell SR, Ceccarelli M, Mach T, Beis K, Low AS,

Bamford VA, Booth IR, Bayley H, Naismith JH. Altered antibiotic transport in OmpC

mutants isolated from a series of clinical strains of multi-drug resistant E. coli. PLoS

One 2011;6:e25825. (b) Hohenester E, Hutchinson WL, Pepys MB, Wood SP. Crystal

structure of a decameric complex of human serum amyloid P component with bound

dAMP. J Mol Biol 1997;269:570–578. (c) Dedeo MT, Duderstadt KE, Berger JM,

Francis MB. Nanoscale protein assemblies from a circular permutant of the tobacco

mosaic virus. Nano Lett 2010;10:181–186. (d) Hurley TD, Bosron WF, Stone CL,

Amzel LM. Structures of three human beta alcohol dehydrogenase variants. Correlations

with their functional differences. J Mol Biol 1994;239:415–429. (e) Fermi G, Perutz

MF, Shaanan B, Fourme R. The crystal structure of human deoxyhaemoglobin at 1.74 A

resolution. J Mol Biol 1984;175:159–174. (f) To be published. (g) Antson AA, Otridge

J, Brzozowski AM, Dodson EJ, Dodson GG, Wilson KS, Smith TM, Yang M, Kurecki

Page 22 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 23: The near-symmetry of proteins

23

T, Gollnick P. The structure of trp RNA-binding attenuation protein. Nature

1995;374:693–700. (h) Kiser PD, Lodowski DT, Palczewski K. Purification,

crystallization and structure determination of native GroEL from Escherichia coli

lacking bound potassium ions. Acta Crystallogr Sect F Struct Biol Cryst Commun

2007;63:457–461. (i) Harata K, Nagahora H, Jigami Y. X-ray structure of wheat germ

agglutinin isolectin 3. Acta Crystallogr Sect D Biol Crystallogr 1995;51:1013–1019. (j)

To be published. (k) Cherezov V, Clogston J, Papiz MZ, Caffrey M. Room to move:

crystallizing membrane proteins in swollen lipidic mesophases. J Mol Biol

2006;357:1605–1618. (l) Sixma TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J,

Witholt B, Hol WG. Refined structure of Escherichia coli heat-labile enterotoxin, a close

relative of cholera toxin. J Mol Biol 1993;230:890–918. (m) Swain AL, Miller MM,

Green J, Rich DH, Schneider J, Kent SB, Wlodawer A. X-ray crystallographic structure

of a complex between a synthetic protease of human immunodeficiency virus 1 and a

substrate-based hydroxyethylamine inhibitor. Proc Natl Acad Sci U S A 1990;87:8805–

8809. (n) Shrive AK, Cheetham GM, Holden D, Myles DA, Turnell WG, Volanakis JE,

Pepys MB, Bloomer AC, Greenhough TJ. Three dimensional structure of human C-

reactive protein. Nat Struct Biol 1996;3:346–354. (o) Tahirov TH, Inagaki E, Ohshima

N, Kitao T, Kuroishi C, Ukita Y, Takio K, Kobayashi M, Kuramitsu S, Yokoyama S,

Miyano M. Crystal structure of purine nucleoside phosphorylase from Thermus

thermophilus. J Mol Biol 2004;337:1149–1160. (p) Vitali J, Colaneri MJ, Kantrowitz E.

Crystal structure of the catalytic trimer of Methanococcus jannaschii aspartate

transcarbamoylase. Proteins 2008;71:1324–1334. (q) Gill HS, Eisenberg D. The crystal

structure of phosphinothricin in the active site of glutamine synthetase illuminates the

mechanism of enzymatic inhibition. Biochemistry 2001;40:1903–1912.

31. (a) Bilgrami S, Yadav S, Kaur P, Sharma S, Perbandt M, Betzel C, Singh TP. Crystal

structure of the disintegrin heterodimer from saw-scaled viper (Echis carinatus) at 1.9 A

resolution. Biochemistry 2005;44:11058–11066. (b) Keating AE, Malashkevich VN,

Tidor B, Kim PS. Side-chain repacking calculations for predicting structures and

stabilities of heterodimeric coiled coils. Proc Natl Acad Sci U S A 2001;98:14825–

14830. (c) Jabeen T, Singh N, Singh RK, Jasti J, Sharma S, Kaur P, Srinivasan A, Singh

TP. Crystal structure of a heterodimer of phospholipase A2 from Naja naja sagittifera at

2.3 A resolution reveals the presence of a new PLA2-like protein with a novel cys 32-

Cys 49 disulphide bridge with a bound sugar at the substrate-binding site. Proteins

2006;62:329–337.

Page 23 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 24: The near-symmetry of proteins

24

32. Groll M, Ditzel L, Löwe J, Stock D, Bochtler M, Bartunik HD, Huber R. Structure of

20S proteasome from yeast at 2.4 A resolution. Nature 1997;386:463–471.

33. Milner D, Raz S, Hel-Or H, Keren D, Nevo E. A new measure of symmetry and its

application to classification of bifurcating structures. Pattern Recognit 2007;40:2237–

2250.

34. Najmudin S, Nalini V, Driessen HP, Slingsby C, Blundell TL, Moss DS, Lindley PF.

Structure of the bovine eye lens protein γB(γII)-crystallin at 1.47 A. Acta Crystallogr D

Biol Crystallogr 1993;49:223–233.

35. Blundell T, Lindley P, Miller L, Moss D, Slingsby C, Tickle I, Turnell B, Wistow G.

The molecular structure and stability of the eye lens: x-ray analysis of γ-crystallin II.

Nature 1981;289:771–777.

36. Greer J, Erickson JW, Baldwin JJ, Varney MD. Application of the three-dimensional

structures of protein target molecules in structure-based drug design. J Med Chem

1994;37:1035–1054.

37. (a) Baker EN, Blundell TL, Cutfield JF, Cutfield SM, Dodson EJ, Dodson GG, Hodgkin

DM, Hubbard RE, Isaacs NW, Reynolds, CD. The structure of 2Zn pig insulin crystals

at 1.5 A resolution. Philos Trans R Soc Lond B Biol Sci 1988;319:369–456 (PDB code:

4INS). (b) Moras D, Olsen KW, Sabesan MN, Buehner M, Ford GC, Rossmann MG.

Studies of asymmetry in the three-dimensional structure of lobster D-glyceraldehyde-3-

phosphate dehydrogenase. J Biol Chem 1975;250:9137–9162 (PDB code: 1GPD). (c)

Jogl G, Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton

transfer: structure of the Michaelis complex of triosephosphate isomerase at 1.2-A

resolution. Proc Natl Acad Sci U S A 2003;100:50–55 (PDB code: 1NEY). (d) Sixma

TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J, Witholt B, Hol WG. Refined

structure of Escherichia coli heat-labile enterotoxin, a close relative of cholera toxin. J

Mol Biol 1993;230:890–918 (PDB code: 1LTS).

38. (a) Ogg D, Elleby B, Norström C, Stefansson K, Abrahmsén L, Oppermann U, Svensson

S. The crystal structure of guinea pig 11β-hydroxysteroid dehydrogenase type 1

provides a model for enzyme-lipid bilayer interactions. J Biol Chem 2005;280:3789–

3794 (PDB code: 1XSE). (b)Zhang J, Osslund TD, Plant MH, Clogston CL, Nybo RE,

Xiong F, Delaney JM, Jordan SR. Crystal structure of murine 11 beta-hydroxysteroid

dehydrogenase 1: an important therapeutic target for diabetes. Biochemistry

2005;44:6948–6957 (PDB code: 1Y5M).

Page 24 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 25: The near-symmetry of proteins

25

39. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface

protein sequences with the hydrophobic moment plot. J Mol Biol 1984;179:125–142.

40. (a) Chagot B, Diochot S, Pimentel C, Lazdunski M, Darbon H. Solution structure of

APETx1 from the sea anemone Anthopleura elegantissima: a new fold for an HERG

toxin. Proteins 2005;59:380–386. (b) Swan MK, Hansen T, Schönheit P, Davies C. A

novel phosphoglucose isomerase (PGI)/phosphomannose isomerase from the

crenarchaeon Pyrobaculum aerophilum is a member of the PGI superfamily: structural

evidence at 1.16-A resolution. J Biol Chem 2004;279:39838–39845. (c) Jogl G,

Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton transfer:

structure of the Michaelis complex of triosephosphate isomerase at 1.2-A resolution.

Proc Natl Acad Sci U S A 2003;100:50–55. (d) Kessler D, Rétey J, Schulz GE. Structure

and action of urocanase. J Mol Biol 2004;342:183–194. (e) Swan MK, Hansen T,

Schönheit P, Davies C. Structural basis for phosphomannose isomerase activity in

phosphoglucose isomerase from Pyrobaculum aerophilum: a subtle difference between

distantly related enzymes. Biochemistry 2004;43:14088–14095. (f) Ziółkowska NE,

O’Keefe BR, Mori T, Zhu C, Giomarelli B, Vojdani F, Palmer KE, McMahon JB,

Wlodawer A. Domain-swapped structure of the potent antiviral protein griffithsin and its

mode of carbohydrate binding. Structure 2006;14:1127–1135. (g) Willis MA, Song F,

Zhuang Z, Krajewski W, Chalamasetty VR, Reddy P, Howard A, Dunaway-Mariano D,

Herzberg O. Structure of YciI from Haemophilus influenzae (HI0828) reveals a

ferredoxin-like alpha/beta-fold with a histidine/aspartate centered catalytic site. Proteins

2005;59:648–652. (h) Adam V, Royant A, Nivière V, Molina-Heredia FP, Bourgeois D.

Structure of superoxide reductase bound to ferrocyanide and active site expansion upon

X-ray-induced photo-reduction. Structure 2004;12:1729–1740. (i) Meijers R, Adolph H-

W, Dauter Z, Wilson KS, Lamzin VS, Cedergren-Zeppezauer ES. Structural evidence

for a ligand coordination switch in liver alcohol dehydrogenase. Biochemistry

2007;46:5446–5454. (j) Liao DI, Calabrese JC, Wawrzak Z, Viitanen P V, Jordan DB.

Crystal structure of 3,4-dihydroxy-2-butanone 4-phosphate synthase of riboflavin

biosynthesis. Structure 2001;9:11–18. (k) Rigsby RE, Rife CL, Fillgrove KL, Newcomer

ME, Armstrong RN. Phosphonoformate: a minimal transition state analogue inhibitor of

the fosfomycin resistance protein, FosA. Biochemistry 2004;43:13666–13673.

Page 25 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 26: The near-symmetry of proteins

26

Figure legends

Figure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by

different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The peptide

backbone structure only (residues are ignored). (c) Cartoon representation of the backbone structure.

Figure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated

amino acids are discussed below in Section 4.4.

Figure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a

different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol dehydrogenase, (e)

Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-

harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV

proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside phosphorylase, (p) Aspartate

transcarbamoylase, (q) Glutamine synthetase.

Figure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by

different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2)

heterodimer.

Figure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by

different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b)

Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The

analyzed structure, adding ghost atoms. See text for explanation of (c) and (d).

Figure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of β1 alcohol

dehydrogenase. (b) Symmetry-related domains of γβ-crystallin. Only coordinates of colored areas were considered

in calculations.

Figure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the

near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase,

(c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources see Ref. 37.

Figure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 β-hydroxysteroid

dehydrogenase Type 1 (11 β-HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-distorted amino acids in

each structure are colored red.

Figure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) – see Table III

for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD). (c)

Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF).

Page 26 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 27: The near-symmetry of proteins

Table I

CSM values of the homomeric proteins structures in Fig. 3.

Name (a) PDB ID G-

symmetry

Backbone CSM

value

Full CSM value

a Porin* 2XE2 C3 0.001 0.006

b Decameric amyloid P 1LGN C5 0.002 0.006

c Tobacco mosaic virus

disc*

3KML C17 0.002 0.002

d Alcohol dehydrogenase 1HDX C2 0.003 0.01

e Hemoglobin 2HHB C2 0.004 0.01

f Neuraminidase* 1V0Z C4 0.004 0.01

g trp RNA-binding

attenuation protein*

1WAP C11 0.006 0.05

h GroEL* 2NWC C7 0.008 0.01

i Wheat germ agglutinin* 1WGT C2 0.01 0.05

j Triose phosphate

isomerase*

8TIM C2 0.02 0.05

k Light-harvesting comp.2 2FKW C9 0.02 0.03

l Beta-subunit of types 1&2

heat-labile enterotoxin*

1LTS C5 0.03 0.08

m HIV proteinase* 7HVP C2 0.04 0.1

n C-reactive protein* 1GNH C5 0.05 0.08

o Purine Nucleoside

Phosphorylase

1ODI C3

C2

D3

0.005

0.002

0.006

0.02

0.01

0.02

p Aspartate

transcarbamoylase

2RGW C3

C2

D3

0.004

0.001

0.005

0.01

0.006

0.01

q Glutamine synthetase 1F52 C6

C2

D6

0.0005

0.0001

0.0008

0.0006

0.0002

0.0009

(a) For data sources see Ref. 30. Starred proteins are referred to in the literature as symmetric; all others

are referred to as quasi-symmetric etc.

Page 27 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 28: The near-symmetry of proteins

Table II

CSM values of the heteromeric proteins in Fig. 4.

Name (a) PDB ID G-

symmetry

Backbone CSM

value

a Disintegrin

heterodimer

1TEJ C2 0.03

b GCN4 Acid/Base

heterodimer

1KDD C2 0.06

c Phospholipase A2

(PLA2)

1Y75 C2 0.09

(a) For data sources see Ref. 31.

Page 28 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 29: The near-symmetry of proteins

Table III

The hydrophilic/hydrophobic ratio of the 15-most symmetry distorted amino-acids in several proteins.

PDB ID(a) Hydrophilic:hydrophobic

ratio

1WKQ 14 : 1

1TZB 14 : 1

1NEY 13 : 2

1UWK 13 : 2

1X9I 13 : 2

2GUD 13 : 2

1MWQ 12 : 3

1VZI 12 : 3

2JHF 12 : 3

1G57 10 : 5

1NKI 10 : 5

(a) For data sources see Ref. 40.

Page 29 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 30: The near-symmetry of proteins

Figure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The

peptide backbone structure only (residues are ignored). (c) Cartoon representation of the backbone

structure. 76x48mm (300 x 300 DPI)

Page 30 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 31: The near-symmetry of proteins

Figure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated amino acids are discussed below in Section 4.4.

82x71mm (300 x 300 DPI)

Page 31 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 32: The near-symmetry of proteins

Figure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol

dehydrogenase, (e) Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside

phosphorylase, (p) Aspartate transcarbamoylase, (q) Glutamine synthetase. 176x168mm (300 x 300 DPI)

Page 32 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 33: The near-symmetry of proteins

Figure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2)

heterodimer. 81x40mm (300 x 300 DPI)

Page 33 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 34: The near-symmetry of proteins

Figure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b) Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The

analyzed structure, adding ghost atoms. See text for explanation of (c) and (d). 173x101mm (300 x 300 DPI)

Page 34 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 35: The near-symmetry of proteins

Figure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of β1 alcohol dehydrogenase. (b) Symmetry-related domains of γβ-crystallin. Only coordinates of colored areas

were considered in calculations. 83x63mm (300 x 300 DPI)

Page 35 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 36: The near-symmetry of proteins

Figure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase, (c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources

see Ref. 37. 176x101mm (300 x 300 DPI)

Page 36 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 37: The near-symmetry of proteins

figure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 β-hydroxysteroid dehydrogenase Type 1 (11 β-HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-

distorted amino acids in each structure are colored red.

83x29mm (300 x 300 DPI)

Page 37 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 38: The near-symmetry of proteins

Figure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) – see Table III for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD).

(c) Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF).

83x57mm (300 x 300 DPI)

Page 38 of 38

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics