A New Perspective on Analysis of Helix–Helix PackingPreferences in Globular ProteinsAntonio Trovato* and Flavio SenoINFM-Dipartimento di Fisica ‘G. Galilei,’ Universita di Padova, Via Marzolo 8, 35131 Padova, Italy
ABSTRACT For many years, statistical analy-sis of protein databanks has led to the belief that thesteric compatibility of helix interfaces may be thesource of observed preferences for particular anglesbetween neighboring helices. Several elegant mod-els describing how side chains on helices can inter-digitate without steric clashes were able to accountquite reasonably for the observed distributions.However, it was later recognized that the ‘bare’measured angle distribution should be corrected toavoid statistical bias.1,2 Disappointingly, the rescaleddistributions dramatically lost their similarity withtheoretical predictions, casting doubts on the valid-ity of the geometrical assumptions and models. Inthis article, we elucidate a few points concerningthe proper choice of a random reference distribu-tion. In particular we demonstrate the need forcorrections induced by unavoidable uncertaintiesin determining whether two helices are in face-to-face contact or not and their relative orientations.By using this new rescaling, we show that ‘true’packing angle preferences are well described byregular packing models, thus proving that preferen-tial angles between contacting helices do exist.Proteins 2004;55:1014–1022. © 2004 Wiley-Liss, Inc.
Key words: protein structure; secondary structures;helix packing
INTRODUCTION
The native state of a protein is determined by theprimary sequence, but exactly how is the subject of muchcurrent research.3 Tertiary structures can be thought of asassemblies of elements of secondary structure: alpha heli-ces, beta strands and turns.4 Therefore it is of paramountimportance to verify whether, beyond energetic consider-ations,5,6 there are steric and geometrical rules thatgovern the packing of secondary structures. Any suchassessment must rely on a statistical analysis able toreveal preferential packing patterns with respect to thebehavior of a suitable reference system.
The issue of the pairwise packing between helices inproteins was addressed soon after helical structures weresuggested. A number of models were developed, mostlydevoted to surface complementarities upon packing. The‘knobs into holes’ model, first introduced by Crick7 andelaborated by Richmond and Richards,8 aimed to find thebest steric fit between regular helices. Chothia, Levitt andRichardson9,10 recognized the importance of ‘ridges’ and
‘grooves’ formed by residues with different sequentialdistances (‘ridges into grooves’ model). Efimov11 tried torelate the packing angle between the two helices to thepreferred rotational states of the side-chains along them. Acomprehensive analysis of the geometrical models waseventually carried out by Walther et al.,12 who modeledhelix packing as the superposition of the two regularlattices that result from unrolling the helix cylinders ontoa plane and contain points representing each residue. Thesix ‘preferred’ angles predicted by this model are consis-tent with earlier results and with histograms of experimen-tally observed packing angles (see Fig. 7 of ref. 12). Theagreement between theoretical modeling and experimen-tal data was remarkable, although not perfect (see ref. 12for a more detailed discussion).
However, the success of steric models in providing anexplanation for the most prevalent packing angles was putunder discussion after the observation made by Bowie1
that statistical corrections must be applied to the valuescollected from experimentally determined structures be-fore true interaxial angle preferences can be revealed.Indeed, the helix–helix packing is defined to occur onlywhen (1) the segment of closest approach (SCA), of lengthdr, between the two finite helix axes is shorter than aprefixed threshold dc and (2) this segment intersects bothhelix axes at a perpendicular angle.
In ref. 12, two helices were considered to be in contact ifthey were satisfying both conditions (1) and (2). Note thatWalther et al.12 selected interacting helical pairs fromtheir database of native protein conformations by consider-ing the distances between all possible inter-helical heavyatom pairs. We used a different selection procedure,involving distances between inter-helical C� atom pairs(see Methods for details). None of these particular choicesreally affect our geometrical analysis, and condition (1), asdefined above, is just the simplest way of defining anequivalent distance constraint.
Condition (2) is needed to ensure face-to-face packing ofthe two helices, justifying thus the use of theoreticalmodeling based on the steric interdigitation of the helices.
*Correspondence to: A. Trovato, INFM-Dipartimento di Fisica ‘G.Galilei,’ Universita di Padova, Via Marzolo 8, 35131 Padova, Italy.E-mail: [email protected].
Received 10 September 2003; 27 November 2003; Accepted 30November 2003
Published online 14 April 2004 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/prot.20083
PROTEINS: Structure, Function, and Bioinformatics 55:1014–1022 (2004)
© 2004 WILEY-LISS, INC.
If the SCA is coincident with the global segment ofclosest approach (GSCA), of length d, between the twostraight lines which are obtained by indefinitely prolong-ing the helix axes, condition (2) is then automaticallysatisfied. In such a situation, which corresponds to effec-tively dealing with helices of infinite length [see Fig. 1(a)],the reference probability distribution of interaxial anglesP(�) is simply the spherical-polar distribution betweenany two random vectors, namely P(�) � sin �.1
Walther and co-workers2 realized that the finiteness ofhelix axes is crucial in modifying the angular dependenceof P(�), because requiring that the SCA intersects bothaxes at a perpendicular angle introduces new restrictionsdepending on the packing angle �. This can be understoodby looking at Figure 2(a), where it is shown that fixing onehelix position, and assuming the GSCA to be orthogonal tothe page, the second finite helix axis may then be placedonly within a plane parallel to the paper plane such that
its starting point lies inside the darkly shaded parallelo-gram A.
The probability of placing the starting point of thesecond helix in A is proportional to its area and therefore tosin �. Consequently the probability P(�) for selecting aparticular packing angle is proportional to the product ofthe spherical-polar contribution and of the sin �-effect dueto the finite length condition:
P(�)�sin2� (1)
With this new random reference distribution the actualangular propensities need to be reconsidered. As is clearfrom Fig. 2 of ref. 2, the normalized frequencies reveal aprominent representation of packing angles near 0ˆ0 and180ˆ0 (not expected by theoretical models) whereas thepredicted optimal steric packing angles manifest them-selves, at most as shoulders in the distribution of propensi-
Fig. 2. (a) In this figure, the position of helix 1 (represented by a vector of length h1) and a vector orthogonalto the paper plane, determining the direction of the GSCA to a second helix, are fixed. Given a particularpacking angle �, in order to satisfy condition (2) without any tolerance �max, a second finite helix axis (of lengthh2) may then be placed only within a plane parallel to the paper plane and such that its starting point is insidethe darkly shaded parallelogram A. Otherwise the SCA between the two finite axes would no longer beperpendicular to both of them. The area of parallelogram A is h1h2 sin �. (b) The tolerance �max with which thecondition of perpendicularity between the SCA and the two helices is implemented enlarges the portion ofspace where the starting point of helix 2 can be placed. The region G formed by the four lightly shadedparallelograms and the four white corner regions, which together surround parallelogram A, can now beexploited. The two parallelograms parallel to helix 2, whose sides are h2 and � � �lmax sin �, are due to thegeometric arrangement described in Figure 1(b) with �1 � 0 and �2 � 0. (The vector with the dashed startingpoint represents the possible location of helix 2 in such a context.) The two other parallelograms, parallel tohelix 1, with sides h1 and �, correspond to the similar case when �1 � 0 and �2 � 0. The four remaining whiteregions, corresponding to the case in which �1 � 0 and �2 � 0, are all limited by hyperbola-like curves. The twoof them subtending an obtuse angle are due to the geometric arrangement described in Figure 1(c),corresponding to the two helices being placed on opposite sides with respect to the GSCA. (The vector with thewhite starting point represents the possible location of helix 2 in such a context.) The two of them subtending anacute angle correspond to the two helices being placed on the same side with respect to the GSCA. In bothcases, it is possible to find the equation describing such boundary curves, which we omit here for the sake ofsimplicity. The area of G is 2(h1 � h2)� plus a non-trivial contribution from the four corner regions. (c) Thedistance constraint (1) limits by itself the portion of plane surrounding parallelogram A where the starting pointof the second helix can be placed. This region, G, is simply the locus of points whose Euclidean distance fromA in the paper plane is less than � �dc
2 d2. For given values of d and the distance threshold dc, this regionis thus formed by the four lightly shaded parallelograms and the four white circular sectors. The area of G is2(h1 � h2) � �2.
HELIX–HELIX PACKING PREFERENCES 1015
Fig
.1.
(a)T
hegl
obal
segm
ento
fclo
sest
appr
oach
(GS
CA
)ofl
engt
hd
betw
een
two
stra
ight
lines
isby
defin
ition
perp
endi
cula
rto
both
ofth
em.T
hese
gmen
tofc
lose
stap
proa
ch(S
CA
),of
leng
thd r
,be
twee
ntw
ohe
lices
(sch
emat
ical
lyre
pres
ente
dby
cylin
ders
)coi
ncid
esw
ithth
egl
obal
segm
enti
fiti
nter
sect
sbo
thhe
lices
with
inth
eira
xis
leng
ths.
Thi
sfa
ctis
alw
ays
guar
ante
edif
the
two
helic
esar
eas
sum
edto
have
infin
itele
ngth
s(h
ypot
hesi
sus
edin
ref.
1).(
b)T
hetw
ofin
itehe
lices
have
aS
CA
that
isno
tcoi
ncid
entw
ithth
eG
CS
A.T
hela
tteri
nter
sect
she
lix1
ata
dist
ance
�l 1
from
itsen
dan
dm
ay(a
sin
the
pict
ure)
orm
ayno
tint
erse
cthe
lix2.
The
form
erjo
ins
the
endp
oint
ofhe
lix1
toa
poin
tins
ide
helix
2.T
hean
gle
form
edw
ithhe
lix1
(�1)d
evia
tes
from
� 2by
anam
ount
� 1.I
nste
ad� 2
�� 2
and
� 2�
0.T
hetr
igon
omet
ricre
latio
nex
pres
sing
�l 1
asan
incr
easi
ngfu
nctio
nof
� 1ca
nbe
easi
lyob
tain
ed:�
l 1(�
1)�
(dsi
n� 1
)/(s
in�
�si
n2�
si
n2� 1
).W
hen
� 1�
� ma
x,c
ondi
tion
(2),
defin
edin
the
text
,is
still
fulfi
lled.
Thi
sis
equi
vale
ntto
requ
iring
that
�l 1
��
l ma
x�
�l 1
(�m
ax).
Not
ice
that
�l 1
(�1)d
iver
ges
whe
n� 1
appr
oach
es�
from
belo
w,a
ndco
nseq
uent
ly,w
hen
�sin
���
sin
� ma
x,�
l ma
xdi
verg
es.I
nsu
cha
regi
me,
cond
ition
(2)
[but
obvi
ousl
yno
t(1)
beca
use
d rm
ayal
sodi
verg
e]is
alw
ays
satis
fied
with
inth
eal
low
edto
lera
nce.
Asi
mila
rsi
tuat
ion
aris
esan
da
sim
ilar
anal
ysis
can
beap
plie
dfo
r� 1
�0
and
� 2�
0.(c
)T
heG
SC
Ais
inte
rsec
ting
neith
erhe
lix1
nor
helix
2an
dth
eS
CA
isjo
inin
gtw
oen
dpoi
nts.
Inth
eca
sede
pict
edin
the
figur
e,th
etw
ohe
lices
are
plac
edon
oppo
site
side
sw
ithre
spec
tto
the
GS
CA
.The
thre
shol
dco
nditi
on� 1
�� 2
�� m
ax
may
betr
ansl
ated
into
the
follo
win
gon
ein
volv
ing
�l 1
and
�l 2
:
��l 1
��
l 2co
s�
���
l 2�
�l 1
cos
��
���
d2
��
l 12 sin
2 ���
d2
��
l 22 sin
2 ��
��l 1
��
l 2co
s�
��d
2�
�l 12 si
n2 �
���
l 2�
�l 1
cos
���
d2
��
l 22 sin
2 ��
cot
� max
.
(d)S
imila
rto
Fig
ure
1(c)
,but
inth
isca
seth
etw
ohe
lices
are
plac
edon
the
sam
esi
dew
ithre
spec
tto
the
GS
CA
.The
thre
shol
dco
nditi
on� 1
�� 2
�� m
ax
beco
mes
��l 1
��
l 2co
s�
���
l 2�
�l 1
cos
��
���
d2
��
l 12 sin
2 ���
d2
��
l 22 sin
2 ��
��l 1
��
l 2co
s�
��d
2�
�l 12 si
n2 �
���
l 2�
�l 1
cos
���
d2
��
l 22 sin
2 ��
cot
� max
.
Not
eth
atfo
rcl
arity
,in
this
figur
eas
wel
las
inF
igur
es1(
b,c)
,th
eac
tual
valu
es� 1
,� 2
,w
hich
wou
ldbe
allo
wed
with
inou
ran
alys
is(�
1�
� 2�
� ma
x)
are
inde
edsm
alle
rth
anth
ose
repr
esen
ted
grap
hica
lly.
ties. The predictive ability of geometrical models is castinto serious doubts after this analysis.
Our first observation is that the random distributionP(�) � sin2 � is correct if and only if the condition ofmutual perpendicularity between the SCA and the twoaxes is strictly fulfilled. But is this the real situation whenstatistical histograms are derived?
As a matter of fact, condition (2) was relaxed by Waltheret al., by admitting a small tolerance: a total deviation � ��1 � �2 (�1 and �2 are the complementary angles to theangles �1 and �2 formed between the SCA and helix 1 andhelix 2, respectively [see Fig. 1(b–d)] was accepted up to athreshold �max (�max � 1° in ref. 2 and �max � 5° in ref. 12).
At first, one might wonder why such a threshold needs tobe used since it does not effectively increase the number ofdata contributing to the histograms. However, we believethat this choice is indeed necessary because the ambiguityin the definition of the axis direction in natural helices,which is due both to experimental error in structuredetermination and to the arbitrariness of the axis recon-struction algorithm, introduces an intrinsic uncertainty inthe computation of the �, �, � angles (see Methods for adetailed explanation of how the axis is reconstructed in thetypical case of a non-ideal helix and for an estimation ofsuch uncertainty ��). It is thus possible to determinewhether � is 0 only within the uncertainty threshold ��,implying that under some circumstances the differentcases represented in Figure 1 cannot really be distin-guished from each other. A relaxation of condition (2) isthus crucial, if we want to correctly analyze data extractedfrom the database of real native protein structures.
We will show in this article, by means of partly semi-analytical geometrical arguments and eventually numeri-cal simulations, that such threshold effect does drasticallychange the random reference distribution. To demonstratethe necessity of considering this effect, we will presentnumerical evidence that the introduction of some degree ofuncertainty in an ideally constructed ensemble of helixpairs generates an interaxial angle distribution that canbe described only if the face-to-face packing condition (2) isrelaxed.
Having measured helix–helix packing angles from a setof 600 proteins representative of the Protein Data Bank(PDB) native structures, we will eventually reanalyze thepacking angles distribution after their proper rescalingwith the newfound reference distribution.
MATERIALS AND METHODSDatabank and Helix Pair Selection
We employed the same ensemble of 600 proteins consid-ered by Chang et al.,13 which consisted of sequencesvarying in length from 44 to 1017, with low sequencehomology and covering many different three-dimensionalfolds according to the Structural Classification of Proteins(SCOP) scheme.14 The structures were monomeric anddetermined using X-ray crystallography. We collected4397 helices, each with at least four consecutive residuesclassified as helical in the PDB files. The average numberof residues of these helices was 11.5. Two helices were
defined to be in close contact if at least one interhelicalcontact between C� atoms was present, with a maximalthreshold distance of 5.5 Å [analogous to condition (1) inthe text]. Only helix pairs separated in sequence by atleast 20 intervening residues were considered in order toeliminate possible correlations induced by short loops.Finally, we considered only helix pairs whose axes weresufficiently straight, according to a definition given in thefollowing subsection. The resulting data set consisted of627 closely-packed helix pairs.
Helix Axis Reconstruction
Reconstructing the helix axis from the coordinates of theC� atoms of the corresponding residues is a critical step inthe determination of packing angle preferences. Eventhough we eventually removed bent and supercoiled heli-ces from the data set, we adopted the procedure describedby Walther et al. in ref. 12, in which a local axis isassociated with every consecutive residue pair along thehelix, since local distortions could occur even for straighthelices, due to circumstances such as experimental uncer-tainties in the protein structure determination. The over-all axis is thus a broken line consisting of short segments.
A good starting approximation for the local axis ai,based on the four C� atom positions ri 1, ri, ri�1, ri�2, wasintroduced by Chothia et al.10 We employed a slightlymodified definition, in which we first defined the normal-ized bond vectors bi � (ri�1 r� i)/�ri�1 r� i�. In this way,the set of three orthonormal vectors ti, vi, ui, the naturalreference system associated with the C� atom trace, wasdefined without distortions due to the fluctuations of thebond length between consecutive C� atoms in the followingway:15 ti � (bi � bi 1)/�bi � bi 1�, vi � bi � bi 1/�bi � bi 1�,ui � ti � vi. If the C� atom trace followed a perfect idealhelix, the vector ai � ui � ui�1 would be parallel to thehelix axis. We then initially determined the local axisbetween residues i, i � 1, as the vector parallel to ai, oflength 1.45 Å, which has the geometric center of the closestfour consecutive C� atom positions, ri 1, ri, ri�1, ri�2, asits midpoint. The two local axes at the helix termini wereobtained by simply prolonging the neighboring ones.
We then applied a smoothing procedure, similar tomethods described by Walther et al.12 The ‘new’ local axesai were obtained by averaging the direction of the closestthree ‘old’ ones, ai 1, ai, ai�1, while conserving its mid-points. For the two local axes at the helix termini, weaveraged over the direction of the closest two availableones. To ensure the continuity of the overall axis, the innerhinges of the broken axial line were computed as themidpoints between extremities of consecutive local vectorsobtained from the running average. After repeating thewhole procedure twice, the standard deviation of thedistances of each C� atom from the reconstructed helixaxis was 0.16 Å.
In order to detect and remove curved helices from thedata set, we proceeded as follows. For any given local axissegment, say the i-th, we computed the average angle���i � ¥j�i�ij /(n 2) formed between the segment and allother ones forming the reconstructed axis, where n is the
HELIX–HELIX PACKING PREFERENCES 1017
helix number of residues. Then, after selecting the mini-mum value � � mini���i, we removed helices for which� � 14° from the data set. The threshold was chosen asroughly twice the average angle formed between consecu-tive local segments (see below).
The interaxial packing angle between two helices wascomputed between the local axis pair for which the mini-mum distance of closest approach was achieved. In casethe SCA intersected an inner hinge of the broken globalaxis, the local axis direction was defined as the average ofthe two corresponding local vectors. The whole discussionconcerning the perpendicularity of both local axes to theSCA applies only to cases involving a terminal local axis,since face-to-face packing is anyway ensured in case ofcontact between inner local axes. Packing angles arepositive if the background helix is rotated clockwise withrespect to the frontal helix when facing them. The angle� � 0° (� � �180°) corresponds to parallel (antiparallel)helices with respect to their sequence direction.
Imposing the further requirement that the SCA inter-sect both local axis directions at a perpendicular anglewithin a threshold �max played a critical role, as discussedbelow. For the three different �max values considered inthis work, �max � 7°, 11°, 15°, we collected data sets of 396,429, 454 closely packed helix pairs, respectively.
Finally, we report the mean value of the angle ��formed between consecutive local directions averaged overall inner hinges of all helical axes, which were recon-structed using the procedure discussed above. We foundthat �� � 6.6° � 4.0°, strongly supporting the necessity ofallowing a similar threshold when imposing angular con-straints as in condition (2).
RESULTS AND DISCUSSIONGeometrical Analysis
We will now sketch the geometric consequences ofrelaxing condition (2) within a given threshold, in order tounderstand by means of a simple argument how thisaffects the random reference distribution. In principle, acomprehensive analytical treatment would be feasible butcumbersome. Numerical simulation will ultimately be thepreferential approach of extracting the corrected randomreference distribution for interaxial angles.
Within the admitted threshold �max, cases similar to theone described in Figure 1(b) may occur. The SCA betweenthe two helices, which intersects helix 1 at one of its axisends and helix 2 at some internal point, forms an angle �1,which deviates from �⁄2 by less than �max, whereas theGSCA does not intersect both helices. (In this situation,�1 � 0, �2 � 0).
If d � dc, the two helices are considered to be in contact.The condition �1 � �max is equivalent to the condition that theGSCA intersects helix 1 within a distance �lmax from the endof its axis. This limit is reached when �1 � �min � �
2 �max.
There are also cases in which �1 � 0 and �2 � 0, when theSCA intersects helix 2 at one of its axis ends and helix 1 atsome internal point, and cases in which both �1 and �2 arenot zero, and the SCA intersects each helix at one axis end[Fig. 1(c,d)]. In this last case, the geometrical condition
implied by the fact that �1 � �2 � �max, involving thedistances �l1 and �l2 from the ends of both helix axes totheir intersection with the GSCA, is more complicated(and is discussed in the legend to Fig. 1).
Using the visual representation of Figure 2(b), we cansay that allowing condition (2) to be satisfied within thethreshold � � �1 � �2 � �max, opens up the possibility forthe starting point of helix 2 to lie in the portion of space (G)formed by the four lightly shaded parallelograms (twoparallel to helix 1 with sides h1 and � � �lmax sin �, andtwo parallel to helix 2 with sides h2 and �) and by the fourremaining white corner regions (boundaries defined byhyperbolas; see Figure 2 caption for details) that surroundparallelogram A.
Computing the area of G is not trivial, because of thehyperbola-shaped regions, but the result is obviously�-dependent. For example a simple trigonometric calcula-tion shows that
� � � d sin �max
�sin2� � sin2�max
�sin �� � sin �max
� �sin �� � sin �max
(2)
This implies that the area of G diverges for �sin �� � sin�max, which already indicates that relevant corrections arepossible even for small values of �max.
To conclude our analysis, we need to apply condition (1).By using the Pythagorean theorem, it is easy to see that forany fixed d � dc, condition (1) is satisfied for any point(thought of as the starting point of helix 2) whose distance�s(dr) � �dr
2 d2 from parallelogram A is less than � �s(dc) � �dc
2 d2. These points belong to the region G,shown in Figure 2(c), formed by four parallelograms (twowith sides h1 and , and two with sides h2 and ) and the fourcircular sectors (of radius ) surrounding A. The area of G isindependent on � (see Fig. 2c legend).
All of the points in the portion of space H � G � G,resulting from the intersection of G and G satisfy bothconditions (1) and (2).
Therefore, the probability P�max(d) of selecting a particu-lar angle � with a tolerance of �max, at a given distance d �dc, is given by the product of the spherical polar term sin�1 and a term proportional to the area of parallelogram Aplus the area of H.
When �sin �� � sin �max, eq. (2) shows that � divergesand thus:
P�max�d� � sin ��h1h2sin � � 2�h1 � h2� � B� (3)
where B is the area of the corner regions in H.Since does not depend on �, the second term in the
bracket will eventually dominate the small �sin �� behaviorof P�max(d). In practice, the actual relevance of this effectcan be appreciated only after integrating P�max(d) over d,since the relative weights of the different terms in thebracket vary with d. Such computation is quite cumber-some and we have instead chosen to obtain P�max bysimulating random helices that satisfy the contact condi-tions within a threshold �max and to numerically extractthe normalized histograms.
1018 TROVATO AND SENO
Numerical Simulations
In order to compute the reference distribution P�max(�)for interaxial packing angles we generated random helixpairs by means of computer simulations and then selectedthem with the same conditions, (1) and (2), used to extracthistograms from real helices in native protein structures.Note that when computing the reference distribution wedid not take steric effects into account; in other words,random helices might overlap.
More specifically, we constructed ideal, discrete heliceswith the same geometrical properties as �-helices in realproteins, i.e. twist per residue 99.1°, rise per residue 1.45Å, radius 2.3 Å (as is the case when considering C� atoms).We chose to generate helices consisting of 11 residues, theaverage length in the data set of real helices that wecollected from the PDB. Keeping the position of the firsthelix fixed, the second helix was placed by first choosingrandomly the midpoint of its axis within a sphere of radius15 Å centered at the midpoint of the first helix axis andthen selecting, again randomly, both the direction of itsaxis and the twist of its first residue.
Boundary effects might be relevant when the radius ofthe sphere in which the second helix is generated is toosmall with respect to the helix length and the helixfiniteness is effectively reduced. We made sure that ourresults did not change when increasing the radius of thesphere in which the second helix axis was placed.
We generated 5 � 107 random helix pairs, 29,915,472 ofwhich satisfied condition (1) of having at least one pair ofresidues separated by less then 5.5 Å. Condition (2) wasthen applied, in the same way as explained in Methods forthe data set of real helices, with the whole axis beingtreated as one single segment in the ideal case. In thisway, we generated 11,467,456, 13,033,815 and 14,553,626helix pairs, respectively, for the three different �max valuesconsidered in this work, �max � 7°, 11°, 15°.
In Figure 3, we plot the corresponding random referencedistributions P�max(�), comparing them to the ideal (�max �0) case: P(�) � sin2 �. The difference, although due to avery subtle effect, is substantial, and it is clearly seen thata new regime is present for �sin �� � sin �max, as expectedfrom the previous discussion.
In order to verify the main point of this article, that dueto the presence of an intrinsic uncertainty in helix axisdetermination it is necessary to relax the face-to-facepacking condition (2) for a proper statistical analysis, wecarried out the following numerical experiment.
In the same way as described above, we constructed adatabase of 106 random, overlapping, discrete helix pairs(each helix consisting of 11 residues), 598,099 of whichsatisfy the distance constraint (1) of having at least onepair of residues within 5.5 Å of each other. All residueswere first placed according to the same idealized geometri-cal rules described above, but each of them was thenrandomly displaced within a cube of side 2 Å centered in itsoriginal location.
Starting from this database, we reconstructed the helixaxes using the same algorithm employed also for realproteins (see Methods). Based on the reconstructed axes,
we selected the helix pairs satisfying the face-to-facepacking condition (2) in a strict manner, obtaining 172,685helix pairs. As shown in the upper panel of Figure 4, theresulting distribution is quite different from the expectedP0°(�) � sin2 � behavior and instead is very similar to thedistribution P7°(�). The value of the threshold �max to beconsidered in order to match the ‘experimental’ datadepends on the mean size of the random displacementused to introduce uncertainty in the database, but weverified that this effect is always present. Therefore athreshold strictly greater than zero has to be introduced.
We then used the same data set to check whether it ispossible to select a subset of unambiguous face-to-facepacking helix pairs, which would then be distributedaccording to P0°(�), even in the presence of some degree ofuncertainty. This can be done in a natural way by usingthe local segments introduced to reconstruct the helix axisand by retaining only pairs of contacting helices in whichthe GSCA is connecting ‘inner’ local segments and therelative number of such ‘inner’ segments is varied (see Fig.4 legend).
As shown in the lower panel of Figure 4, on decreasingsuch numbers, the resulting distribution gets closer toP0°(�), without reaching it, and the number of dataanalyzed is dramatically reduced to 8,361 in the extremecase, corresponding to retaining only two central innersegments. If such rules were applied to the databaseconstructed from native protein structures, only four datawould be left. Moreover, in an intermediate case, for which
Fig. 3. Ideal ‘random’ distribution of interaxial packing angles for fourdifferent values of the angle threshold �max used in applying condition (2):�max � 0° (upper left panel), �max � 7° (upper right panel), �max �11°(lower left panel), �max � 15° (lower right panel). The four distributionswere computed by means of numerical simulations described in the text.Since ideal helices do not have a preferential direction, the histogramsshown in this figure have been restricted to the 90° � � � 90° region.The 14,553,626 packing angle values collected in the �max � 15°histogram were then successively filtered out by the more and morerestrictive �max � 11°, �max � 7°, �max � 0° threshold conditions togenerate the 13,033,815, 11,467,456 and 8,646,994 data, respectively,collected in the corresponding histograms. The dashed line is the fit of the�max � 0° data, obtained by enforcing condition (2) in a strict way, to theexpected P(�) � sin2 � distribution, which is then reported for compari-son in all other histograms. All histograms were constructed with a binwidth of 1.5°. Histograms were normalized in such a way that a flatdistribution would correspond to a constant height of 1.
HELIX–HELIX PACKING PREFERENCES 1019
one would hope to retain a large enough data set, 80,650 inour simulation, the resulting distribution is much moresimilar to the one obtained without any special distinctionbetween ‘inner’ and ‘outer’ segments (and thus to the�max � 7° case) than to the ideal P0°(�).
Discussion
In the previous section, we gave numerical evidence thatthe only accurate way to analyze experimental data con-sists of rescaling them with a random reference distribu-tion computed by relaxing the face-to-face packing condi-tion (2).
To test the effect of this new reference distribution, werecomputed an experimental distribution of interaxialangles (see Methods for details) by analyzing a databank of600 proteins and using a contact threshold dc � 5.5 Å and atolerance �max � 7°, corresponding to the intrinsic uncer-tainty �� � 6.6° estimated when determining local axisdirections (see Methods). Since the reference distributionwas derived with the assumption that helix axes can berepresented by straight, finite segments, curved or super-coiled helices were removed from the set of contactinghelix pairs (see Methods). The histogram is reported in the
upper panel of Figure 5: in order to obtain good statisticsfor each single bin, we gathered results every 15°. Thehistogram is consistent with previous analysis.1,2,12 In themiddle panel of Figure 5, we plot P7°(�) and, for reference,P0°(�), normalized with the same binning as in theprevious panel. Finally, in the lower panel of Figure 5 wepresent the histogram both rescaled with P7°(�) and withP0°(�). The results are clear and striking: the correctreference distribution P7°(�) removes the spurious peaksat 0° and �180°, leaving four rather clean peaks located at 157.5°, 37.5°, 22.5° and 157.5° (rough estimates for thecenters of the bins corresponding to the histogram localmaxima). Note that four of the six packing angles pre-dicted by steric models12 are indeed consistent with suchpeaks within binning uncertainty, or are at most found inneighboring bins. Only two of the predicted packing anglesare clearly not favored according to our analysis. We alsonote that a residual preference for parallel and antiparal-lel alignment of two contacting helices cannot be ruled outeven after proper rescaling. Because of the relatively smallnumber of entries in the bins around � � 0°, 180°, it isdifficult to ascertain whether this is a genuine effect or justa spurious one induced by the presence of nearby peaks.
In order to further test both the robustness of the resultsand the full procedure we have obtained statistical histo-grams from the experimental data using three differentthreshold values for �max. Since we can compare them withthe corresponding random distribution, we should expect
Fig. 4. Upper panel: ‘experimental’ reference distribution of interaxialpacking angles obtained by enforcing condition (2) with �max � 0° aftersome degree of uncertainty was introduced in a data set consisting ofrandom overlapping helix pairs (thick grey histogram), compared to thetheoretical reference distribution P0°(�) � sin2 � expected in the samecase (light grey shaded histogram) and to the theoretical referencedistribution P7°(�) expected when condition (2) is relaxed with a threshold�max � 7° (black histogram). Lower panel: ‘experimental’ referencedistribution of interaxial packing angles obtained by enforcing condition(2) with �max � 0° after some degree of uncertainty was introduced in adata set consisting of random overlapping helix pairs, with the furthercondition that a helix pair would be discarded if none of the local segments(introduced in the helix axis reconstruction procedure) connected by theGSCA were ‘inner.’ The local i-th segment, 1 � i � ns, is considered to be‘inner’ depending on the parameter nend, when nend � i � ns nend � 1,and ‘outer’ otherwise, where ns is the number of local segments used toreconstruct the helix axis (ns � 10 in our simulations). Distributionsobtained with nend � 2 and nend � 4 are shown as thick grey and blackhistograms, respectively, and are compared to the theoretical referencedistribution P0°(�) � sin2 � expected in both cases (light grey shadedhistogram). All histograms were constructed with a bin width of 15°, andnormalized in such a way that a flat distribution would correspond to aconstant height of 1. ‘Theoretical’ distributions are the same as shown inFigure 3 but binned in a different way.
Fig. 5. Upper panel: Experimental ‘bare’ un-rescaled distribution ofinteraxial packing angles between contacting helices. Data were obtainedwith a distance threshold d � 5.5 Å for interhelical C� atom pairs and anangular threshold �max � 7° for condition (2). The histogram wasnormalized in such a way that a flat distribution would correspond to aconstant height of 1. Middle panel: ideal ‘random’ distribution that must beused to rescale the ‘bare’ experimental distribution in order to reveal truepacking angle preferences. The two different distributions obtained witheither �max � 0° (thick line) or �max � 7° (filled columns) are shown.Histograms were normalized in such a way that a flat distribution wouldcorrespond to a constant height of 1. Lower panel: rescaled distribution ofinteraxial packing angles. The experimental ‘bare’ distribution in the upperpanel was divided by the ideal ‘random’ one in the middle panel, obtainedwith either �max � 0° (thick line) or �max � 7° (filled columns). Histogramheights greater than 1 correspond to preferential packing angles, whereasheights smaller than 1 correspond to disfavored packing angles. Allhistograms were constructed with a bin width of 15°. Arrows in the upperand lower panels mark the values of the six preferred packing anglespredicted by Walther et al.;12 �abc � 37.1°, 97.4°, 22.0°, eachrepresented twice with a periodicity of 180°.
1020 TROVATO AND SENO
three similar, rescaled histograms. This is nicely con-firmed by Figure 6, in which the three rescaled histogramsshow a very good overlap within statistical uncertainty.Generally, the greater the threshold used to relax condi-tion (2), the more blurred the peaks corresponding to thesterically preferred angles. Note that this is expected ifsuch peaks are due to steric mechanisms. We are in factforced to introduce the threshold �max � 0 in order tocorrectly analyze the data, but this implies the unavoid-able inclusion in the data set of non-face-to-face packinginstances, for which steric mechanisms definitely do notoccur. This effect is more and more evident as the thresh-old �max value is increased. Our results demonstrate thatsteric models do successfully predict the ‘true’ preferentialinteraxial packing angles, which are observed after properrescaling, at least for the very restrictive contact thresholdused in this work. The relevance of steric mechanisms indetermining the packing angle distribution should ofcourse decrease with the distance between interactinghelices.16 The fact that two of the preferential anglespredicted by steric models are not actually observed mightbe explained by means of energetic considerations.17
Finally, we would like to remark that, given the intrinsicuncertainty of the data at our disposal, some degree ofarbitrariness in their statistical treatment is unfortu-nately unavoidable. We could have chosen different valuesfor the threshold �max in our analysis, or different bound-ary conditions and/or helix length in our simulations, sincethere is no easy way of ascertaining the values mostconsistent with the specific features of the particularproteins that compose the database. Moreover, we notethat, in order to rescale data selected with some given
threshold, a reference distribution computed with a higherthreshold would in principle be necessary, again becausethe intrinsic uncertainty in the measure of �max effectivelyenlarges the statistical sampling. The important conclu-sion is that, as long as a threshold �max � 5° is used, nosuch considerations crucially change the scenario reportedhere, and the results shown in Figures 5 and 6 areconfirmed in all cases. We believe this is further confirma-tion of the correctness of our approach.
CONCLUSIONS
In this article, we have shown that, due to the approxima-tions introduced to ensure face-to-face packing betweencontacting helices, calculation of the probability distribu-tion of interaxial angles between random finite helices isnot a trivial geometric problem. Such approximations areunavoidable, due to the imperfect shape of naturallyoccurring helices, which do not have well defined axes, andto experimental uncertainties in the determination ofprotein structures. Although analytical results can befound to estimate the correct reference distribution, thesimplest way to obtain it is through numerical simula-tions. We have presented a re-analysis of the distributionof packing angles rescaled with our new reference distribu-tion, finding remarkable agreement with the packingangles predicted by steric models.10
Our results differ from those presented in previouswork,2 wherein the simpler reference distribution P0°(�) �sin2 �, corresponding to the ideal case (thus neglecting anymistake in selecting face-to-face packing instances), wasemployed. We conclude by mentioning that a differentexplanation was very recently proposed,16 which sug-gested that the strong preference for parallel and antipar-allel helix alignment found in ref. 2 is an artifact due tobinning data for histograms before application of therescaling procedure. Indeed, the order in which binningand rescaling are performed might be important, espe-cially when the amount of available experimental data islimited. However, in all of the histograms reported inFigures 5 and 6, which were obtained with both ideal,P0°(�), and corrected, P7°(�), P11°(�), P15°(�), rescalingfactors, we did not observe any significant shift in the peaklocations when first rescaling and then binning data. Evenwhen the sensitivity to single measurement error wasproperly taken into account, in the way suggested in ref.16, by using the simple reference distribution P0°(�) �sin2 � we still obtained the strong preference for paralleland antiparallel helix alignment found in previous works.1,2
To our understanding, such preference is indeed an arti-fact, but it is due to neglecting the possibility of mistakesin selecting face-to-face packing instances. It is thuscrucial to select the correct reference distribution, asshown in this work, whereas sensitivity to measurementerror by itself only provides minor corrections.
ACKNOWLEDGMENTS
We are indebted to Amos Maritan for stimulating discus-sions and advice, Giuseppe Zanotti for helpful suggestionsand Iksoo Chang for his invaluable help in providing us
Fig. 6. Rescaled distribution of interaxial packing angles betweencontacting helices for three different values of the angle threshold �max
used in enforcing condition (2): �max � 7° (light brown filled columns),�max � 11° (red line), �max � 15° (black line). Each histogram was obtainedby dividing the experimental unrescaled distribution by the correspondingideal reference one (we used the three distributions represented in Figure3 with a different binning). All histograms were constructed with a binwidth of 15°. Note that the histograms were not normalized, as they werecomputed as the ratio of two normalized histograms. The arrows mark thevalues of the six optimal packing angles predicted by Walther et al.12
HELIX–HELIX PACKING PREFERENCES 1021
with the protein database used in ref. 13. We thankHarvey Dobbs for a critical reading of the manuscript,Greg Chirikjian for sharing and discussing with us hismanuscript, and an anonymous referee for helpful sugges-tions. This work was supported by INFM, MIUR COFIN-2003 and FISR 2002.
REFERENCES
1. Bowie JU. Helix packing angle preferences. Nat Struct Biol1997;4:915–917.
2. Walther D, Springer C, Cohen FE. Helix–helix packing anglepreferences for finite helix axes. Proteins 1998;33:457–459.
3. Schonbrun J, Wedemeyer WJ, Baker D. Protein structure predic-tion in 2002. Curr Opin Struc Biol 2002;12:348–354.
4. Chothia C, Finkelstein AV. The classification and origin of proteinfolding patterns. Annu Rev Biochem 1990;59:1007–1039.
5. Chou K-C, Nemethy G, Scheraga HA. Energetic approach to thepacking of �-helices. 1. Equivalent helices. J Phys Chem 1983;87:2869–2881.
6. Chou K-C, Nemethy G, Scheraga HA. Energetic approach to thepacking of �-helices. 2. General treatment of non-equivalent andnonregular helices. J Am Chem Soc 1984;106:3161– 3170.
7. Crick FHC. The packing of �-helices: simple coiled coils. ActaCrystallog 1953;6:689–697.
8. Richmond TJ, Richards FM. Packing of �-helices: geometricconstraints and contact area. J Mol Biol 1978;119:537–555.
9. Chothia C, Levitt M, Richardson D. Structure of proteins: packingof �-helices and pleated sheets. Proc Natl Acad Sci USA 1977;74:4130–4134.
10. Chothia C, Levitt M, Richardson D. Helix to helix packing inproteins. J Mol Biol 1981;145:215–250.
11. Efimov AV. Packing of �-helices in globular proteins. Layer-structure of globin hydrophobic cores. J Mol Biol 1979;134:23–40.
12. Walther D, Eisenhaber F, Argos P. Principles of helix–helixpacking in proteins: the helical lattice superposition model. J MolBiol 1996;255:536–553.
13. Chang I, Cieplak M, Dima RI, Maritan A, Banavar JR. Proteinthreading by learning. Proc Natl Acad Sci USA 2001;98:14350–14355.
14. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP - Astructural classification of proteins database for the investigationof sequences and structures. J Mol Biol 1995;247:536–540.
15. Rey A, Skolnick J. Efficient algorithm for the reconstruction of aprotein backbone from the �-carbon coordinates. J Comput Chem1992;13:443–456.
16. Lee S, Chirikjian GS. Interhelical angle and distance preferencesin globular proteins. Biophys J 2004;86:1118–1123.
17. Chou K-C, Nemethy G, Scheraga HA. Energetics of interaction ofregular structural elements in proteins. Acc Chem Res 1990;23:134–141.
1022 TROVATO AND SENO