Download pdf - A new perspective on analysis of helix–helix packing preferences in globular proteins

A New Perspective on Analysis of Helix–Helix PackingPreferences in Globular ProteinsAntonio Trovato* and Flavio SenoINFM-Dipartimento di Fisica ‘G. Galilei,’ Universita di Padova, Via Marzolo 8, 35131 Padova, Italy

ABSTRACT For many years, statistical analy-sis of protein databanks has led to the belief that thesteric compatibility of helix interfaces may be thesource of observed preferences for particular anglesbetween neighboring helices. Several elegant mod-els describing how side chains on helices can inter-digitate without steric clashes were able to accountquite reasonably for the observed distributions.However, it was later recognized that the ‘bare’measured angle distribution should be corrected toavoid statistical bias.1,2 Disappointingly, the rescaleddistributions dramatically lost their similarity withtheoretical predictions, casting doubts on the valid-ity of the geometrical assumptions and models. Inthis article, we elucidate a few points concerningthe proper choice of a random reference distribu-tion. In particular we demonstrate the need forcorrections induced by unavoidable uncertaintiesin determining whether two helices are in face-to-face contact or not and their relative orientations.By using this new rescaling, we show that ‘true’packing angle preferences are well described byregular packing models, thus proving that preferen-tial angles between contacting helices do exist.Proteins 2004;55:1014–1022. © 2004 Wiley-Liss, Inc.

Key words: protein structure; secondary structures;helix packing

INTRODUCTION

The native state of a protein is determined by theprimary sequence, but exactly how is the subject of muchcurrent research.3 Tertiary structures can be thought of asassemblies of elements of secondary structure: alpha heli-ces, beta strands and turns.4 Therefore it is of paramountimportance to verify whether, beyond energetic consider-ations,5,6 there are steric and geometrical rules thatgovern the packing of secondary structures. Any suchassessment must rely on a statistical analysis able toreveal preferential packing patterns with respect to thebehavior of a suitable reference system.

The issue of the pairwise packing between helices inproteins was addressed soon after helical structures weresuggested. A number of models were developed, mostlydevoted to surface complementarities upon packing. The‘knobs into holes’ model, first introduced by Crick7 andelaborated by Richmond and Richards,8 aimed to find thebest steric fit between regular helices. Chothia, Levitt andRichardson9,10 recognized the importance of ‘ridges’ and

‘grooves’ formed by residues with different sequentialdistances (‘ridges into grooves’ model). Efimov11 tried torelate the packing angle between the two helices to thepreferred rotational states of the side-chains along them. Acomprehensive analysis of the geometrical models waseventually carried out by Walther et al.,12 who modeledhelix packing as the superposition of the two regularlattices that result from unrolling the helix cylinders ontoa plane and contain points representing each residue. Thesix ‘preferred’ angles predicted by this model are consis-tent with earlier results and with histograms of experimen-tally observed packing angles (see Fig. 7 of ref. 12). Theagreement between theoretical modeling and experimen-tal data was remarkable, although not perfect (see ref. 12for a more detailed discussion).

However, the success of steric models in providing anexplanation for the most prevalent packing angles was putunder discussion after the observation made by Bowie1

that statistical corrections must be applied to the valuescollected from experimentally determined structures be-fore true interaxial angle preferences can be revealed.Indeed, the helix–helix packing is defined to occur onlywhen (1) the segment of closest approach (SCA), of lengthdr, between the two finite helix axes is shorter than aprefixed threshold dc and (2) this segment intersects bothhelix axes at a perpendicular angle.

In ref. 12, two helices were considered to be in contact ifthey were satisfying both conditions (1) and (2). Note thatWalther et al.12 selected interacting helical pairs fromtheir database of native protein conformations by consider-ing the distances between all possible inter-helical heavyatom pairs. We used a different selection procedure,involving distances between inter-helical C� atom pairs(see Methods for details). None of these particular choicesreally affect our geometrical analysis, and condition (1), asdefined above, is just the simplest way of defining anequivalent distance constraint.

Condition (2) is needed to ensure face-to-face packing ofthe two helices, justifying thus the use of theoreticalmodeling based on the steric interdigitation of the helices.

*Correspondence to: A. Trovato, INFM-Dipartimento di Fisica ‘G.Galilei,’ Universita di Padova, Via Marzolo 8, 35131 Padova, Italy.E-mail: [email protected].

Received 10 September 2003; 27 November 2003; Accepted 30November 2003

Published online 14 April 2004 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/prot.20083

PROTEINS: Structure, Function, and Bioinformatics 55:1014–1022 (2004)

© 2004 WILEY-LISS, INC.

If the SCA is coincident with the global segment ofclosest approach (GSCA), of length d, between the twostraight lines which are obtained by indefinitely prolong-ing the helix axes, condition (2) is then automaticallysatisfied. In such a situation, which corresponds to effec-tively dealing with helices of infinite length [see Fig. 1(a)],the reference probability distribution of interaxial anglesP(�) is simply the spherical-polar distribution betweenany two random vectors, namely P(�) � sin �.1

Walther and co-workers2 realized that the finiteness ofhelix axes is crucial in modifying the angular dependenceof P(�), because requiring that the SCA intersects bothaxes at a perpendicular angle introduces new restrictionsdepending on the packing angle �. This can be understoodby looking at Figure 2(a), where it is shown that fixing onehelix position, and assuming the GSCA to be orthogonal tothe page, the second finite helix axis may then be placedonly within a plane parallel to the paper plane such that

its starting point lies inside the darkly shaded parallelo-gram A.

The probability of placing the starting point of thesecond helix in A is proportional to its area and therefore tosin �. Consequently the probability P(�) for selecting aparticular packing angle is proportional to the product ofthe spherical-polar contribution and of the sin �-effect dueto the finite length condition:

P(�)�sin2� (1)

With this new random reference distribution the actualangular propensities need to be reconsidered. As is clearfrom Fig. 2 of ref. 2, the normalized frequencies reveal aprominent representation of packing angles near 0ˆ0 and180ˆ0 (not expected by theoretical models) whereas thepredicted optimal steric packing angles manifest them-selves, at most as shoulders in the distribution of propensi-

Fig. 2. (a) In this figure, the position of helix 1 (represented by a vector of length h1) and a vector orthogonalto the paper plane, determining the direction of the GSCA to a second helix, are fixed. Given a particularpacking angle �, in order to satisfy condition (2) without any tolerance �max, a second finite helix axis (of lengthh2) may then be placed only within a plane parallel to the paper plane and such that its starting point is insidethe darkly shaded parallelogram A. Otherwise the SCA between the two finite axes would no longer beperpendicular to both of them. The area of parallelogram A is h1h2 sin �. (b) The tolerance �max with which thecondition of perpendicularity between the SCA and the two helices is implemented enlarges the portion ofspace where the starting point of helix 2 can be placed. The region G formed by the four lightly shadedparallelograms and the four white corner regions, which together surround parallelogram A, can now beexploited. The two parallelograms parallel to helix 2, whose sides are h2 and � � �lmax sin �, are due to thegeometric arrangement described in Figure 1(b) with �1 � 0 and �2 � 0. (The vector with the dashed startingpoint represents the possible location of helix 2 in such a context.) The two other parallelograms, parallel tohelix 1, with sides h1 and �, correspond to the similar case when �1 � 0 and �2 � 0. The four remaining whiteregions, corresponding to the case in which �1 � 0 and �2 � 0, are all limited by hyperbola-like curves. The twoof them subtending an obtuse angle are due to the geometric arrangement described in Figure 1(c),corresponding to the two helices being placed on opposite sides with respect to the GSCA. (The vector with thewhite starting point represents the possible location of helix 2 in such a context.) The two of them subtending anacute angle correspond to the two helices being placed on the same side with respect to the GSCA. In bothcases, it is possible to find the equation describing such boundary curves, which we omit here for the sake ofsimplicity. The area of G is 2(h1 � h2)� plus a non-trivial contribution from the four corner regions. (c) Thedistance constraint (1) limits by itself the portion of plane surrounding parallelogram A where the starting pointof the second helix can be placed. This region, G, is simply the locus of points whose Euclidean distance fromA in the paper plane is less than � �dc

2 d2. For given values of d and the distance threshold dc, this regionis thus formed by the four lightly shaded parallelograms and the four white circular sectors. The area of G is2(h1 � h2) � �2.

HELIX–HELIX PACKING PREFERENCES 1015

Fig

.1.

(a)T

hegl

obal

segm

ento

fclo

sest

appr

oach

(GS

CA

)ofl

engt

hd

betw

een

two

stra

ight

lines

isby

defin

ition

perp

endi

cula

rto

both

ofth

em.T

hese

gmen

tofc

lose

stap

proa

ch(S

CA

),of

leng

thd r

,be

twee

ntw

ohe

lices

(sch

emat

ical

lyre

pres

ente

dby

cylin

ders

)coi

ncid

esw

ithth

egl

obal

segm

enti

fiti

nter

sect

sbo

thhe

lices

with

inth

eira

xis

leng

ths.

Thi

sfa

ctis

alw

ays

guar

ante

edif

the

two

helic

esar

eas

sum

edto

have

infin

itele

ngth

s(h

ypot

hesi

sus

edin

ref.

1).(

b)T

hetw

ofin

itehe

lices

have

aS

CA

that

isno

tcoi

ncid

entw

ithth

eG

CS

A.T

hela

tteri

nter

sect

she

lix1

ata

dist

ance

�l 1

from

itsen

dan

dm

ay(a

sin

the

pict

ure)

orm

ayno

tint

erse

cthe

lix2.

The

form

erjo

ins

the

endp

oint

ofhe

lix1

toa

poin

tins

ide

helix

2.T

hean

gle

form

edw

ithhe

lix1

(�1)d

evia

tes

from

� 2by

anam

ount

� 1.I

nste

ad� 2

�� 2

and

� 2�

0.T

hetr

igon

omet

ricre

latio

nex

pres

sing

�l 1

asan

incr

easi

ngfu

nctio

nof

� 1ca

nbe

easi

lyob

tain

ed:�

l 1(�

1)�

(dsi

n� 1

)/(s

in�

�si

n2�

si

n2� 1

).W

hen

� 1�

� ma

x,c

ondi

tion

(2),

defin

edin

the

text

,is

still

fulfi

lled.

Thi

sis

equi

vale

ntto

requ

iring

that

�l 1

��

l ma

x�

�l 1

(�m

ax).

Not

ice

that

�l 1

(�1)d

iver

ges

whe

n� 1

appr

oach

es�

from

belo

w,a

ndco

nseq

uent

ly,w

hen

�sin

��

sin

� ma

x,�

l ma

xdi

verg

es.I

nsu

cha

regi

me,

cond

ition

(2)

[but

obvi

ousl

yno

t(1)

beca

use

d rm

ayal

sodi

verg

e]is

alw

ays

satis

fied

with

inth

eal

low

edto

lera

nce.

Asi

mila

rsi

tuat

ion

aris

esan

da

sim

ilar

anal

ysis

can

beap

plie

dfo

r� 1

�0

and

� 2�

0.(c

)T

heG

SC

Ais

inte

rsec

ting

neith

erhe

lix1

nor

helix

2an

dth

eS

CA

isjo

inin

gtw

oen

dpoi

nts.

Inth

eca

sede

pict

edin

the

figur

e,th

etw

ohe

lices

are

plac

edon

oppo

site

side

sw

ithre

spec

tto

the

GS

CA

.The

thre

shol

dco

nditi

on� 1

�� 2

�� m

ax

may

betr

ansl

ated

into

the

follo

win

gon

ein

volv

ing

�l 1

and

�l 2

:

��l 1

��

l 2co

s�

��

l 2�

�l 1

cos

��

��

d2

��

l 12 sin

2 ��

d2

��

l 22 sin

2 ��

��l 1

��

l 2co

s�

��d

2�

�l 12 si

n2 �

��

l 2�

�l 1

cos

��

d2

��

l 22 sin

2 ��

cot

� max

.

(d)S

imila

rto

Fig

ure

1(c)

,but

inth

isca

seth

etw

ohe

lices

are

plac

edon

the

sam

esi

dew

ithre

spec

tto

the

GS

CA

.The

thre

shol

dco

nditi

on� 1

�� 2

�� m

ax

beco

mes

��l 1

��

l 2co

s�

��

l 2�

�l 1

cos

��

��

d2

��

l 12 sin

2 ��

d2

��

l 22 sin

2 ��

��l 1

��

l 2co

s�

��d

2�

�l 12 si

n2 �

��

l 2�

�l 1

cos

��

d2

��

l 22 sin

2 ��

cot

� max

.

Not

eth

atfo

rcl

arity

,in

this

figur

eas

wel

las

inF

igur

es1(

b,c)

,th

eac

tual

valu

es� 1

,� 2

,w

hich

wou

ldbe

allo

wed

with

inou

ran

alys

is(�

1�

� 2�

� ma

x)

are

inde

edsm

alle

rth

anth

ose

repr

esen

ted

grap

hica

lly.

ties. The predictive ability of geometrical models is castinto serious doubts after this analysis.

Our first observation is that the random distributionP(�) � sin2 � is correct if and only if the condition ofmutual perpendicularity between the SCA and the twoaxes is strictly fulfilled. But is this the real situation whenstatistical histograms are derived?

As a matter of fact, condition (2) was relaxed by Waltheret al., by admitting a small tolerance: a total deviation � ��1 � �2 (�1 and �2 are the complementary angles to theangles �1 and �2 formed between the SCA and helix 1 andhelix 2, respectively [see Fig. 1(b–d)] was accepted up to athreshold �max (�max � 1° in ref. 2 and �max � 5° in ref. 12).

At first, one might wonder why such a threshold needs tobe used since it does not effectively increase the number ofdata contributing to the histograms. However, we believethat this choice is indeed necessary because the ambiguityin the definition of the axis direction in natural helices,which is due both to experimental error in structuredetermination and to the arbitrariness of the axis recon-struction algorithm, introduces an intrinsic uncertainty inthe computation of the �, �, � angles (see Methods for adetailed explanation of how the axis is reconstructed in thetypical case of a non-ideal helix and for an estimation ofsuch uncertainty ��). It is thus possible to determinewhether � is 0 only within the uncertainty threshold ��,implying that under some circumstances the differentcases represented in Figure 1 cannot really be distin-guished from each other. A relaxation of condition (2) isthus crucial, if we want to correctly analyze data extractedfrom the database of real native protein structures.

We will show in this article, by means of partly semi-analytical geometrical arguments and eventually numeri-cal simulations, that such threshold effect does drasticallychange the random reference distribution. To demonstratethe necessity of considering this effect, we will presentnumerical evidence that the introduction of some degree ofuncertainty in an ideally constructed ensemble of helixpairs generates an interaxial angle distribution that canbe described only if the face-to-face packing condition (2) isrelaxed.

Having measured helix–helix packing angles from a setof 600 proteins representative of the Protein Data Bank(PDB) native structures, we will eventually reanalyze thepacking angles distribution after their proper rescalingwith the newfound reference distribution.

MATERIALS AND METHODSDatabank and Helix Pair Selection

We employed the same ensemble of 600 proteins consid-ered by Chang et al.,13 which consisted of sequencesvarying in length from 44 to 1017, with low sequencehomology and covering many different three-dimensionalfolds according to the Structural Classification of Proteins(SCOP) scheme.14 The structures were monomeric anddetermined using X-ray crystallography. We collected4397 helices, each with at least four consecutive residuesclassified as helical in the PDB files. The average numberof residues of these helices was 11.5. Two helices were

defined to be in close contact if at least one interhelicalcontact between C� atoms was present, with a maximalthreshold distance of 5.5 Å [analogous to condition (1) inthe text]. Only helix pairs separated in sequence by atleast 20 intervening residues were considered in order toeliminate possible correlations induced by short loops.Finally, we considered only helix pairs whose axes weresufficiently straight, according to a definition given in thefollowing subsection. The resulting data set consisted of627 closely-packed helix pairs.

Helix Axis Reconstruction

Reconstructing the helix axis from the coordinates of theC� atoms of the corresponding residues is a critical step inthe determination of packing angle preferences. Eventhough we eventually removed bent and supercoiled heli-ces from the data set, we adopted the procedure describedby Walther et al. in ref. 12, in which a local axis isassociated with every consecutive residue pair along thehelix, since local distortions could occur even for straighthelices, due to circumstances such as experimental uncer-tainties in the protein structure determination. The over-all axis is thus a broken line consisting of short segments.

A good starting approximation for the local axis ai,based on the four C� atom positions ri 1, ri, ri�1, ri�2, wasintroduced by Chothia et al.10 We employed a slightlymodified definition, in which we first defined the normal-ized bond vectors bi � (ri�1 r� i)/�ri�1 r� i�. In this way,the set of three orthonormal vectors ti, vi, ui, the naturalreference system associated with the C� atom trace, wasdefined without distortions due to the fluctuations of thebond length between consecutive C� atoms in the followingway:15 ti � (bi � bi 1)/�bi � bi 1�, vi � bi � bi 1/�bi � bi 1�,ui � ti � vi. If the C� atom trace followed a perfect idealhelix, the vector ai � ui � ui�1 would be parallel to thehelix axis. We then initially determined the local axisbetween residues i, i � 1, as the vector parallel to ai, oflength 1.45 Å, which has the geometric center of the closestfour consecutive C� atom positions, ri 1, ri, ri�1, ri�2, asits midpoint. The two local axes at the helix termini wereobtained by simply prolonging the neighboring ones.

We then applied a smoothing procedure, similar tomethods described by Walther et al.12 The ‘new’ local axesai were obtained by averaging the direction of the closestthree ‘old’ ones, ai 1, ai, ai�1, while conserving its mid-points. For the two local axes at the helix termini, weaveraged over the direction of the closest two availableones. To ensure the continuity of the overall axis, the innerhinges of the broken axial line were computed as themidpoints between extremities of consecutive local vectorsobtained from the running average. After repeating thewhole procedure twice, the standard deviation of thedistances of each C� atom from the reconstructed helixaxis was 0.16 Å.

In order to detect and remove curved helices from thedata set, we proceeded as follows. For any given local axissegment, say the i-th, we computed the average angle��i � ¥j�i�ij /(n 2) formed between the segment and allother ones forming the reconstructed axis, where n is the


helix number of residues. Then, after selecting the mini-mum value � � mini��i, we removed helices for which� � 14° from the data set. The threshold was chosen asroughly twice the average angle formed between consecu-tive local segments (see below).

The interaxial packing angle between two helices wascomputed between the local axis pair for which the mini-mum distance of closest approach was achieved. In casethe SCA intersected an inner hinge of the broken globalaxis, the local axis direction was defined as the average ofthe two corresponding local vectors. The whole discussionconcerning the perpendicularity of both local axes to theSCA applies only to cases involving a terminal local axis,since face-to-face packing is anyway ensured in case ofcontact between inner local axes. Packing angles arepositive if the background helix is rotated clockwise withrespect to the frontal helix when facing them. The angle� � 0° (� � �180°) corresponds to parallel (antiparallel)helices with respect to their sequence direction.

Imposing the further requirement that the SCA inter-sect both local axis directions at a perpendicular anglewithin a threshold �max played a critical role, as discussedbelow. For the three different �max values considered inthis work, �max � 7°, 11°, 15°, we collected data sets of 396,429, 454 closely packed helix pairs, respectively.

Finally, we report the mean value of the angle ��formed between consecutive local directions averaged overall inner hinges of all helical axes, which were recon-structed using the procedure discussed above. We foundthat �� 6.6° � 4.0°, strongly supporting the necessity ofallowing a similar threshold when imposing angular con-straints as in condition (2).

RESULTS AND DISCUSSIONGeometrical Analysis

We will now sketch the geometric consequences ofrelaxing condition (2) within a given threshold, in order tounderstand by means of a simple argument how thisaffects the random reference distribution. In principle, acomprehensive analytical treatment would be feasible butcumbersome. Numerical simulation will ultimately be thepreferential approach of extracting the corrected randomreference distribution for interaxial angles.

Within the admitted threshold �max, cases similar to theone described in Figure 1(b) may occur. The SCA betweenthe two helices, which intersects helix 1 at one of its axisends and helix 2 at some internal point, forms an angle �1,which deviates from �⁄2 by less than �max, whereas theGSCA does not intersect both helices. (In this situation,�1 � 0, �2 � 0).

If d � dc, the two helices are considered to be in contact.The condition �1 � �max is equivalent to the condition that theGSCA intersects helix 1 within a distance �lmax from the endof its axis. This limit is reached when �1 � �min � �

2 �max.

There are also cases in which �1 � 0 and �2 � 0, when theSCA intersects helix 2 at one of its axis ends and helix 1 atsome internal point, and cases in which both �1 and �2 arenot zero, and the SCA intersects each helix at one axis end[Fig. 1(c,d)]. In this last case, the geometrical condition

implied by the fact that �1 � �2 � �max, involving thedistances �l1 and �l2 from the ends of both helix axes totheir intersection with the GSCA, is more complicated(and is discussed in the legend to Fig. 1).

Using the visual representation of Figure 2(b), we cansay that allowing condition (2) to be satisfied within thethreshold � � �1 � �2 � �max, opens up the possibility forthe starting point of helix 2 to lie in the portion of space (G)formed by the four lightly shaded parallelograms (twoparallel to helix 1 with sides h1 and � � �lmax sin �, andtwo parallel to helix 2 with sides h2 and �) and by the fourremaining white corner regions (boundaries defined byhyperbolas; see Figure 2 caption for details) that surroundparallelogram A.

Computing the area of G is not trivial, because of thehyperbola-shaped regions, but the result is obviously�-dependent. For example a simple trigonometric calcula-tion shows that

� � � d sin �max

�sin2� � sin2�max

�sin �� sin �max

� �sin �� sin �max

(2)

This implies that the area of G diverges for �sin �� sin�max, which already indicates that relevant corrections arepossible even for small values of �max.

To conclude our analysis, we need to apply condition (1).By using the Pythagorean theorem, it is easy to see that forany fixed d � dc, condition (1) is satisfied for any point(thought of as the starting point of helix 2) whose distance�s(dr) � �dr

2 d2 from parallelogram A is less than � �s(dc) � �dc

2 d2. These points belong to the region G,shown in Figure 2(c), formed by four parallelograms (twowith sides h1 and , and two with sides h2 and ) and the fourcircular sectors (of radius ) surrounding A. The area of G isindependent on � (see Fig. 2c legend).

All of the points in the portion of space H � G � G,resulting from the intersection of G and G satisfy bothconditions (1) and (2).

Therefore, the probability P�max(d) of selecting a particu-lar angle � with a tolerance of �max, at a given distance d �dc, is given by the product of the spherical polar term sin�1 and a term proportional to the area of parallelogram Aplus the area of H.

When �sin �� sin �max, eq. (2) shows that � divergesand thus:

P�max�d� � sin ��h1h2sin � � 2�h1 � h2� � B� (3)

where B is the area of the corner regions in H.Since does not depend on �, the second term in the

bracket will eventually dominate the small �sin �� behaviorof P�max(d). In practice, the actual relevance of this effectcan be appreciated only after integrating P�max(d) over d,since the relative weights of the different terms in thebracket vary with d. Such computation is quite cumber-some and we have instead chosen to obtain P�max bysimulating random helices that satisfy the contact condi-tions within a threshold �max and to numerically extractthe normalized histograms.

1018 TROVATO AND SENO

Numerical Simulations

In order to compute the reference distribution P�max(�)for interaxial packing angles we generated random helixpairs by means of computer simulations and then selectedthem with the same conditions, (1) and (2), used to extracthistograms from real helices in native protein structures.Note that when computing the reference distribution wedid not take steric effects into account; in other words,random helices might overlap.

More specifically, we constructed ideal, discrete heliceswith the same geometrical properties as �-helices in realproteins, i.e. twist per residue 99.1°, rise per residue 1.45Å, radius 2.3 Å (as is the case when considering C� atoms).We chose to generate helices consisting of 11 residues, theaverage length in the data set of real helices that wecollected from the PDB. Keeping the position of the firsthelix fixed, the second helix was placed by first choosingrandomly the midpoint of its axis within a sphere of radius15 Å centered at the midpoint of the first helix axis andthen selecting, again randomly, both the direction of itsaxis and the twist of its first residue.

Boundary effects might be relevant when the radius ofthe sphere in which the second helix is generated is toosmall with respect to the helix length and the helixfiniteness is effectively reduced. We made sure that ourresults did not change when increasing the radius of thesphere in which the second helix axis was placed.

We generated 5 � 107 random helix pairs, 29,915,472 ofwhich satisfied condition (1) of having at least one pair ofresidues separated by less then 5.5 Å. Condition (2) wasthen applied, in the same way as explained in Methods forthe data set of real helices, with the whole axis beingtreated as one single segment in the ideal case. In thisway, we generated 11,467,456, 13,033,815 and 14,553,626helix pairs, respectively, for the three different �max valuesconsidered in this work, �max � 7°, 11°, 15°.

In Figure 3, we plot the corresponding random referencedistributions P�max(�), comparing them to the ideal (�max �0) case: P(�) � sin2 �. The difference, although due to avery subtle effect, is substantial, and it is clearly seen thata new regime is present for �sin �� sin �max, as expectedfrom the previous discussion.

In order to verify the main point of this article, that dueto the presence of an intrinsic uncertainty in helix axisdetermination it is necessary to relax the face-to-facepacking condition (2) for a proper statistical analysis, wecarried out the following numerical experiment.

In the same way as described above, we constructed adatabase of 106 random, overlapping, discrete helix pairs(each helix consisting of 11 residues), 598,099 of whichsatisfy the distance constraint (1) of having at least onepair of residues within 5.5 Å of each other. All residueswere first placed according to the same idealized geometri-cal rules described above, but each of them was thenrandomly displaced within a cube of side 2 Å centered in itsoriginal location.

Starting from this database, we reconstructed the helixaxes using the same algorithm employed also for realproteins (see Methods). Based on the reconstructed axes,

we selected the helix pairs satisfying the face-to-facepacking condition (2) in a strict manner, obtaining 172,685helix pairs. As shown in the upper panel of Figure 4, theresulting distribution is quite different from the expectedP0°(�) � sin2 � behavior and instead is very similar to thedistribution P7°(�). The value of the threshold �max to beconsidered in order to match the ‘experimental’ datadepends on the mean size of the random displacementused to introduce uncertainty in the database, but weverified that this effect is always present. Therefore athreshold strictly greater than zero has to be introduced.

We then used the same data set to check whether it ispossible to select a subset of unambiguous face-to-facepacking helix pairs, which would then be distributedaccording to P0°(�), even in the presence of some degree ofuncertainty. This can be done in a natural way by usingthe local segments introduced to reconstruct the helix axisand by retaining only pairs of contacting helices in whichthe GSCA is connecting ‘inner’ local segments and therelative number of such ‘inner’ segments is varied (see Fig.4 legend).

As shown in the lower panel of Figure 4, on decreasingsuch numbers, the resulting distribution gets closer toP0°(�), without reaching it, and the number of dataanalyzed is dramatically reduced to 8,361 in the extremecase, corresponding to retaining only two central innersegments. If such rules were applied to the databaseconstructed from native protein structures, only four datawould be left. Moreover, in an intermediate case, for which

Fig. 3. Ideal ‘random’ distribution of interaxial packing angles for fourdifferent values of the angle threshold �max used in applying condition (2):�max � 0° (upper left panel), �max � 7° (upper right panel), �max �11°(lower left panel), �max � 15° (lower right panel). The four distributionswere computed by means of numerical simulations described in the text.Since ideal helices do not have a preferential direction, the histogramsshown in this figure have been restricted to the 90° � � � 90° region.The 14,553,626 packing angle values collected in the �max � 15°histogram were then successively filtered out by the more and morerestrictive �max � 11°, �max � 7°, �max � 0° threshold conditions togenerate the 13,033,815, 11,467,456 and 8,646,994 data, respectively,collected in the corresponding histograms. The dashed line is the fit of the�max � 0° data, obtained by enforcing condition (2) in a strict way, to theexpected P(�) � sin2 � distribution, which is then reported for compari-son in all other histograms. All histograms were constructed with a binwidth of 1.5°. Histograms were normalized in such a way that a flatdistribution would correspond to a constant height of 1.


one would hope to retain a large enough data set, 80,650 inour simulation, the resulting distribution is much moresimilar to the one obtained without any special distinctionbetween ‘inner’ and ‘outer’ segments (and thus to the�max � 7° case) than to the ideal P0°(�).

Discussion

In the previous section, we gave numerical evidence thatthe only accurate way to analyze experimental data con-sists of rescaling them with a random reference distribu-tion computed by relaxing the face-to-face packing condi-tion (2).

To test the effect of this new reference distribution, werecomputed an experimental distribution of interaxialangles (see Methods for details) by analyzing a databank of600 proteins and using a contact threshold dc � 5.5 Å and atolerance �max � 7°, corresponding to the intrinsic uncer-tainty �� 6.6° estimated when determining local axisdirections (see Methods). Since the reference distributionwas derived with the assumption that helix axes can berepresented by straight, finite segments, curved or super-coiled helices were removed from the set of contactinghelix pairs (see Methods). The histogram is reported in the

upper panel of Figure 5: in order to obtain good statisticsfor each single bin, we gathered results every 15°. Thehistogram is consistent with previous analysis.1,2,12 In themiddle panel of Figure 5, we plot P7°(�) and, for reference,P0°(�), normalized with the same binning as in theprevious panel. Finally, in the lower panel of Figure 5 wepresent the histogram both rescaled with P7°(�) and withP0°(�). The results are clear and striking: the correctreference distribution P7°(�) removes the spurious peaksat 0° and �180°, leaving four rather clean peaks located at 157.5°, 37.5°, 22.5° and 157.5° (rough estimates for thecenters of the bins corresponding to the histogram localmaxima). Note that four of the six packing angles pre-dicted by steric models12 are indeed consistent with suchpeaks within binning uncertainty, or are at most found inneighboring bins. Only two of the predicted packing anglesare clearly not favored according to our analysis. We alsonote that a residual preference for parallel and antiparal-lel alignment of two contacting helices cannot be ruled outeven after proper rescaling. Because of the relatively smallnumber of entries in the bins around � � 0°, 180°, it isdifficult to ascertain whether this is a genuine effect or justa spurious one induced by the presence of nearby peaks.

In order to further test both the robustness of the resultsand the full procedure we have obtained statistical histo-grams from the experimental data using three differentthreshold values for �max. Since we can compare them withthe corresponding random distribution, we should expect

Fig. 4. Upper panel: ‘experimental’ reference distribution of interaxialpacking angles obtained by enforcing condition (2) with �max � 0° aftersome degree of uncertainty was introduced in a data set consisting ofrandom overlapping helix pairs (thick grey histogram), compared to thetheoretical reference distribution P0°(�) � sin2 � expected in the samecase (light grey shaded histogram) and to the theoretical referencedistribution P7°(�) expected when condition (2) is relaxed with a threshold�max � 7° (black histogram). Lower panel: ‘experimental’ referencedistribution of interaxial packing angles obtained by enforcing condition(2) with �max � 0° after some degree of uncertainty was introduced in adata set consisting of random overlapping helix pairs, with the furthercondition that a helix pair would be discarded if none of the local segments(introduced in the helix axis reconstruction procedure) connected by theGSCA were ‘inner.’ The local i-th segment, 1 � i � ns, is considered to be‘inner’ depending on the parameter nend, when nend � i � ns nend � 1,and ‘outer’ otherwise, where ns is the number of local segments used toreconstruct the helix axis (ns � 10 in our simulations). Distributionsobtained with nend � 2 and nend � 4 are shown as thick grey and blackhistograms, respectively, and are compared to the theoretical referencedistribution P0°(�) � sin2 � expected in both cases (light grey shadedhistogram). All histograms were constructed with a bin width of 15°, andnormalized in such a way that a flat distribution would correspond to aconstant height of 1. ‘Theoretical’ distributions are the same as shown inFigure 3 but binned in a different way.

Fig. 5. Upper panel: Experimental ‘bare’ un-rescaled distribution ofinteraxial packing angles between contacting helices. Data were obtainedwith a distance threshold d � 5.5 Å for interhelical C� atom pairs and anangular threshold �max � 7° for condition (2). The histogram wasnormalized in such a way that a flat distribution would correspond to aconstant height of 1. Middle panel: ideal ‘random’ distribution that must beused to rescale the ‘bare’ experimental distribution in order to reveal truepacking angle preferences. The two different distributions obtained witheither �max � 0° (thick line) or �max � 7° (filled columns) are shown.Histograms were normalized in such a way that a flat distribution wouldcorrespond to a constant height of 1. Lower panel: rescaled distribution ofinteraxial packing angles. The experimental ‘bare’ distribution in the upperpanel was divided by the ideal ‘random’ one in the middle panel, obtainedwith either �max � 0° (thick line) or �max � 7° (filled columns). Histogramheights greater than 1 correspond to preferential packing angles, whereasheights smaller than 1 correspond to disfavored packing angles. Allhistograms were constructed with a bin width of 15°. Arrows in the upperand lower panels mark the values of the six preferred packing anglespredicted by Walther et al.;12 �abc � 37.1°, 97.4°, 22.0°, eachrepresented twice with a periodicity of 180°.


three similar, rescaled histograms. This is nicely con-firmed by Figure 6, in which the three rescaled histogramsshow a very good overlap within statistical uncertainty.Generally, the greater the threshold used to relax condi-tion (2), the more blurred the peaks corresponding to thesterically preferred angles. Note that this is expected ifsuch peaks are due to steric mechanisms. We are in factforced to introduce the threshold �max � 0 in order tocorrectly analyze the data, but this implies the unavoid-able inclusion in the data set of non-face-to-face packinginstances, for which steric mechanisms definitely do notoccur. This effect is more and more evident as the thresh-old �max value is increased. Our results demonstrate thatsteric models do successfully predict the ‘true’ preferentialinteraxial packing angles, which are observed after properrescaling, at least for the very restrictive contact thresholdused in this work. The relevance of steric mechanisms indetermining the packing angle distribution should ofcourse decrease with the distance between interactinghelices.16 The fact that two of the preferential anglespredicted by steric models are not actually observed mightbe explained by means of energetic considerations.17

Finally, we would like to remark that, given the intrinsicuncertainty of the data at our disposal, some degree ofarbitrariness in their statistical treatment is unfortu-nately unavoidable. We could have chosen different valuesfor the threshold �max in our analysis, or different bound-ary conditions and/or helix length in our simulations, sincethere is no easy way of ascertaining the values mostconsistent with the specific features of the particularproteins that compose the database. Moreover, we notethat, in order to rescale data selected with some given

threshold, a reference distribution computed with a higherthreshold would in principle be necessary, again becausethe intrinsic uncertainty in the measure of �max effectivelyenlarges the statistical sampling. The important conclu-sion is that, as long as a threshold �max � 5° is used, nosuch considerations crucially change the scenario reportedhere, and the results shown in Figures 5 and 6 areconfirmed in all cases. We believe this is further confirma-tion of the correctness of our approach.

CONCLUSIONS

In this article, we have shown that, due to the approxima-tions introduced to ensure face-to-face packing betweencontacting helices, calculation of the probability distribu-tion of interaxial angles between random finite helices isnot a trivial geometric problem. Such approximations areunavoidable, due to the imperfect shape of naturallyoccurring helices, which do not have well defined axes, andto experimental uncertainties in the determination ofprotein structures. Although analytical results can befound to estimate the correct reference distribution, thesimplest way to obtain it is through numerical simula-tions. We have presented a re-analysis of the distributionof packing angles rescaled with our new reference distribu-tion, finding remarkable agreement with the packingangles predicted by steric models.10

Our results differ from those presented in previouswork,2 wherein the simpler reference distribution P0°(�) �sin2 �, corresponding to the ideal case (thus neglecting anymistake in selecting face-to-face packing instances), wasemployed. We conclude by mentioning that a differentexplanation was very recently proposed,16 which sug-gested that the strong preference for parallel and antipar-allel helix alignment found in ref. 2 is an artifact due tobinning data for histograms before application of therescaling procedure. Indeed, the order in which binningand rescaling are performed might be important, espe-cially when the amount of available experimental data islimited. However, in all of the histograms reported inFigures 5 and 6, which were obtained with both ideal,P0°(�), and corrected, P7°(�), P11°(�), P15°(�), rescalingfactors, we did not observe any significant shift in the peaklocations when first rescaling and then binning data. Evenwhen the sensitivity to single measurement error wasproperly taken into account, in the way suggested in ref.16, by using the simple reference distribution P0°(�) �sin2 � we still obtained the strong preference for paralleland antiparallel helix alignment found in previous works.1,2

To our understanding, such preference is indeed an arti-fact, but it is due to neglecting the possibility of mistakesin selecting face-to-face packing instances. It is thuscrucial to select the correct reference distribution, asshown in this work, whereas sensitivity to measurementerror by itself only provides minor corrections.

ACKNOWLEDGMENTS

We are indebted to Amos Maritan for stimulating discus-sions and advice, Giuseppe Zanotti for helpful suggestionsand Iksoo Chang for his invaluable help in providing us

Fig. 6. Rescaled distribution of interaxial packing angles betweencontacting helices for three different values of the angle threshold �max

used in enforcing condition (2): �max � 7° (light brown filled columns),�max � 11° (red line), �max � 15° (black line). Each histogram was obtainedby dividing the experimental unrescaled distribution by the correspondingideal reference one (we used the three distributions represented in Figure3 with a different binning). All histograms were constructed with a binwidth of 15°. Note that the histograms were not normalized, as they werecomputed as the ratio of two normalized histograms. The arrows mark thevalues of the six optimal packing angles predicted by Walther et al.12


with the protein database used in ref. 13. We thankHarvey Dobbs for a critical reading of the manuscript,Greg Chirikjian for sharing and discussing with us hismanuscript, and an anonymous referee for helpful sugges-tions. This work was supported by INFM, MIUR COFIN-2003 and FISR 2002.

REFERENCES

1. Bowie JU. Helix packing angle preferences. Nat Struct Biol1997;4:915–917.

2. Walther D, Springer C, Cohen FE. Helix–helix packing anglepreferences for finite helix axes. Proteins 1998;33:457–459.

3. Schonbrun J, Wedemeyer WJ, Baker D. Protein structure predic-tion in 2002. Curr Opin Struc Biol 2002;12:348–354.

4. Chothia C, Finkelstein AV. The classification and origin of proteinfolding patterns. Annu Rev Biochem 1990;59:1007–1039.

5. Chou K-C, Nemethy G, Scheraga HA. Energetic approach to thepacking of �-helices. 1. Equivalent helices. J Phys Chem 1983;87:2869–2881.

6. Chou K-C, Nemethy G, Scheraga HA. Energetic approach to thepacking of �-helices. 2. General treatment of non-equivalent andnonregular helices. J Am Chem Soc 1984;106:3161– 3170.

7. Crick FHC. The packing of �-helices: simple coiled coils. ActaCrystallog 1953;6:689–697.

8. Richmond TJ, Richards FM. Packing of �-helices: geometricconstraints and contact area. J Mol Biol 1978;119:537–555.

9. Chothia C, Levitt M, Richardson D. Structure of proteins: packingof �-helices and pleated sheets. Proc Natl Acad Sci USA 1977;74:4130–4134.

10. Chothia C, Levitt M, Richardson D. Helix to helix packing inproteins. J Mol Biol 1981;145:215–250.

11. Efimov AV. Packing of �-helices in globular proteins. Layer-structure of globin hydrophobic cores. J Mol Biol 1979;134:23–40.

12. Walther D, Eisenhaber F, Argos P. Principles of helix–helixpacking in proteins: the helical lattice superposition model. J MolBiol 1996;255:536–553.

13. Chang I, Cieplak M, Dima RI, Maritan A, Banavar JR. Proteinthreading by learning. Proc Natl Acad Sci USA 2001;98:14350–14355.

14. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP - Astructural classification of proteins database for the investigationof sequences and structures. J Mol Biol 1995;247:536–540.

15. Rey A, Skolnick J. Efficient algorithm for the reconstruction of aprotein backbone from the �-carbon coordinates. J Comput Chem1992;13:443–456.

16. Lee S, Chirikjian GS. Interhelical angle and distance preferencesin globular proteins. Biophys J 2004;86:1118–1123.

17. Chou K-C, Nemethy G, Scheraga HA. Energetics of interaction ofregular structural elements in proteins. Acc Chem Res 1990;23:134–141.