Vol. 97, No. 4, 1980
December 31, 1980
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
Pages 1582-1590
PROPERTIES OF NUCLEATION ~ITES IN GLOBULAR PROTEINS
P.K.PON~SWAMY and M.PRAEHAKA~AN
Department of Physics, Autonomous Postgraduate Centre,
University of Madras, Tiruchirapalli 620020,
Tamilnadu, INDIAo
Received October 15,1980
S~,~4.~Y: Using crystal data for a set of proteins and hydro-
phobic indices for amino acids, a criterion is dra~m to identify
nucleation sites in globular proteins. The properties of these
sites are described in terms of certain relevant parameters and
a suggestion is made on the growth and aggregation of nucleation
sites in protein molecules°
A nascent polypeptide chain folds very quickly to form
the active-native structure of the protein, indicating the
possibility of the existence of a few preferred folding pathways
o~t of the innumerable probable conformational states. The
formation of nucleation sites along the sequence of the poly-
peptide due to local interactions is considered to be one of the
possible ways to make a selection of pathways and on which model,
appealing reports are now appearing° From the observed tertiary
patterns of a few globular protein cz~ystals, Wetlaufer (1) first
suggested the possibility of the formation of nucleation regions
by applying the rule of continuous segments in amino acid
sequence, and analysed the u in detail. Matheson and Scheraga (2)
proposed nucleation pockets from the knowledge of short- and
medium-range interactions in proteins. Kanehisa and Tsong (3)
conceived fluctuating local clusters in native structure forma-
This work was supported by a grant by the Department of ~cience and Technology, Government of India to PE~.
0006-291 X/80/241582-09501,00/0 Copyright © 1980 by Academtc Press, Inc. A ll rights o f reproductton in any form reserved. 1582
Vol. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
tion and described the self assembly kinetics. Rose and Roy (4)
suggested that chain sites corresponding to local maxima in
hydrophobicity serve as folding primitives. Recently from the
laboratory of the present authors, a report was made (5) on the
possibility of predicting nucleation sites by the application of
the concept, called the 'surrounding hydrophobicity' for a
residue in the protein matrix. The decisive role played by hydro-
phobic bonding through long-range interactions could be well
realised by the observance of very stable hydrophobic domains in
almost all the known crystal structures of globular protein mole-
cules. In this communication we report the results of our recent
study on the characteristics of nucleation sites in fourteen
proteins.
~H0~
As the hydrophobic clustering is conceived to be one of
the important features in the formation of globular structures in
proteins, we studied the concept of nucleation domains by utilising
the hydrophobic indices of aaino acid residues as given by Tanford (6)
and by Jones (7). Using these experimental quantities we define
what is termed the 'surrounding hydrophobicity' for a residue in
a protein molecule in its crystalline fora: The auino acid residues
are represented by their o-carbon (crystal) coordinates and each
of the residues is assigned with the Tanford-Jones index. The
surrounding hydrophobicity of a residue in the protein matrix
is defined as the sum of the hydrophobic indices of various
residues appearing ~thin 8 A radius limit from the residue in
question (see ref. (8) for details). This bulk index measures
the effective hydrophobic bonding made by the residue with other
interacting residues in the molecule. The residues having
surrounding hydrophobicity values equal to or greater than twice
1583
Vol. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
the average value for all the residues in the protein are here
assumed to form 'hydrophobic domains' and the residue of the
highest surrounding hydrophobicity within a domain is taken to
represent the site of that domain. The site identified in this
way in a protein could then taken to be the result of the atomic
interactions (short-, medium- and long-range) within the effective
range of surroundings in that protein.
The probable nucleating portions of a protein molecule
arising from short- and medium-range interactions alone were
determined as follows: the hydrophobic indices of four re3idues on
either side of a residue were added to that residue and this was
repeated along the length of the protein chain; the residues
having hi~est sumued hydrophobic indices were then identified and
if they were noted to fall within any of the nucleation domains
predicted from considerations of total interactions, they are
assumed to be the nucleation sites arising from interactions
devoid of long-range nature.
For purposes of determining the spatial positions of the
nucleation sites, the protein is assumed to be an ellipsoid of
semi-axes a,b and c, whose volume is just sufficient to enclose
all the residues in it; these medium-sized ellipsoids were
determined by an iterative procedure involving rotations and
translations about the coordinate axes fixed on the centroid
of the protein crystal as described in one of our earlier
reports (9). The spatial position of a residue i with reference
to the centroid of the ellipsoid is given by
2 2 Z~
d i = a2 ~
where xi, Yi and z i are the Cartesian coordinates of the
residue i, the origin coinciding the protein centroid; d i varies
1584
VOI. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
from 0 to l, the zero value corresponding to the aost interior
point and the unity value corresponding to the most exterior or
surface point of the protein.
The accessible contact areas for a solvent molecule of
1o4 A radius were computed for each of the residues in the
protein by the method of Lee and Richards (10).
The maintanance of the native folded structure in a
protein molecule is mainly due to Van der Waals interactions
and hence the stability contribution from these interactions
was determined for each residue from all other residues by
applying the Lennord Jones function and parameters developed
by Scheraga's group. 2~ne segments of the protein chain which
are highly stabilised by Van der Waals interactions were then
identified and those containing any of the nucleating domains
determined from total interactions were taken to represent
Van der Waals ener~ dominating segments.
RESULT3 AI~D DIbCUS~ION
The para~ueters defined above and determined for 14
different globular proteins are given in Table l; the secondary
structures associated with the nucleating domains and the sizes
of the domains in terms of the number of associated residues are
also given in this Table. An analysis of this Table reveals
many characteristic properties of nucleation sites in globular
proteins.
We note at least two nucleation sites for a protein, the
bigger on~s having up to five sites. ~Iost of the site residues
are of nonpolar type, but the occurrence of residues ~ith polar
atoms in their sidechains, viz., Set, T%~r, Tyr, Arg, Gln, Asn,
1585
Vol. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
.H
4J 0
P~
,-4
m ©
-H
H %
ID 4a .el
.H -p
O
O 0 ,
m ~
g
~ o o
~c~ ..Pew
c 'd~J
• r'l ~ 4 " I : " d ~ Ph--'t
~)
"~ 0 0
0 o .H r-I h ~ o c ~ h.'-d 0
o
I I
~ ) ~ . 1 ~
~ ) . ~ r- I . r l
0 I
P.I
~ ~ ~ o o ~, ~ ~ P o L~
Ch ~G I ~
0 ~ 0 ~ ~
I I I I I I H r - t H - H " ~ H
r-I H H
t~0 --,~- G', b'~, kD Cq C'] 0 0 L G b ~ , ~ ' P - ~ r ~ " l O b , P ' - ' D LO
0 i ~ l ~ l l c,,i b'l HI o]., 'II l ' I HI " l kO ~-I 0 ['c',( ~ ~'- ,--I kO ,'D C-" "-," tr~ 0 I:~" h% ~ t C (%1 ,~J C , C,I r -~r" "Y
O['-- ~ ~ h'~ ~'~ H 0 03 0 cJ ~'- : L?, "~, o: C" O :'u ' " CC, --~
0J I N H H H o,l I ( \ ] 0 -~ ~I t r~CX] I H H f ~ H C J , ' I I H I I I I 1 I I I I I I I I I I I I I
~ ~ o o o o o ~ o o ~ o o o o o o o , ~ ~ ~ ~0 O0 OLd, O~ 0 0 0 ~ ~ 0 ~ 0 O0 ~ . . . . . . . . . . . . . . . . . . d d J d O ~ ~ 0 O 0 O 0 0 ~ 0 0 0 0 0 0 ~ 0
O 0 O 0 O 0 O 0 O 0 0 0 0 0 0 ~ 0 0 O 0 O 0
kO K", O 0 O 0 O 0 O 0 0 0 0 0 0 0 0 0 O 0 CO ":t" . - I r'"l 0 t " ' 0 4 0 ',D "~" kO U". "d" "@ 0 -~ 0 cO OhCJ CO0-~ G', C,.
(~.~ (X,l C'~04 (",J Ck~ (kJCM 0 4 N OJ 040 . J ©d C'4 04 N C,. I OJC'J CM .,",J
Z ~ -~ ~ - I l~ ~I ~ o:~,'~.oD_ ~ PQ.~ •
.H
I 0
0 4 0 ~ ~ ~ ~ ~ ~ ~0 O ~ ~ ~ 0 O ~ ~ ~ 0 ~ ~ 0 4 ~ 0 4 0 0 4 ~
. ~ H . ~I o I
• . P I~1 ~ ~ o ~1 I ~1 ® - P O o I:~ r~ o ~ o I o ~ o ~ I.i i - I .H I - - I ~ • ~ 1 ~ I . - I . H
o o • ,4 ~1 -,-I ~1 I ~ ~1 -P ~ I
"t~ -tl l:l
1586
VOI. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
0 0 v
I--I
• ,-t .q ,q ~ . - ~ ~ o
(D ~ . H O
c3 P.
O 0 ~ m
n ~ . ?
e' ':.:.3 ~ e
I0.:, ~ - -~ .....
I 4 ~ ~ ,--i
~ b ' ~ "m ~ 04O' . ~ ~ 12"- 0
L, ~,
I I I I I I I I O f " 0 3 0 0 "~- OJ C) t¢'~ t""- 0 .~: 0"~ r'-I <'. d 4 ( : 0 0 0 0 00 t,~. ~'~ 0 " ~ 0
00 r.-I
O C , O 0 0 0 0 oJ 0 ' ~ 0 L~ ~r'~ [,r~ ! r-{ 0 I ~ L O 00 LO L~, ,--I .~-LO r-I 0 LC~',-D
o o ~
.-~ ~.~
~I --~
4
o o .N f-I ~ o ~
o
o ~1
I I
~,'.! ,'el ~ 4 ~
% 4 . , - t
~)
0
' ~ ~ ~ b ~ 0 0 ~ 0 0 0 ~ ~ ~ 0 0 O O t ~ O O ~ b ~,
4 0 O 0 ~ 0 0 0 0 0 0 0 ~ 0
~ 0 0 0 ~ 0 ~ ~ ~
O O O 0 0 0 0 0 0 0 0 0 0 0
O O O 0 0 0 0 0 0 0 0 0 0 0
O 0 ~ ~ ~ ~ D ~
~ ~ ~ 0 ~ ~ 0 Od ,"-I ~
I - ~
0 o
4
r---I
r'q
.,-q
c-
4~
d 0
o ~4
.~.~
o
o
o
o o ..~
o 4 o
0:, 0
o
0
.,-I
4 ~ 4 ~ .~ . r t
-el
%
4 ~ ~'1
~'1 I0
®
1587
Vol. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
. Q ~ I 4 • 74/~.~
Thermolysin
Lysozyme oC- Chymotrypsin
i ~ . 1.
tlO/4Z :~
$ubtilisin BPN'
Concanavalin A
JiJooJitions of nucleation sites in 71obuler
proteins. The diste~ce (in A units) bet~een sites
are $ivez~ near the connectin~ lines ~nd the existing
hydrophobic channels are marked by curved dashed
lines. Aaino acid sequence numbers for the site
residues are ~-iven inside the circles. Proteins
not shown in this Fig. have only two nucleation
sites each and hence left out.
and His is also to be noted. The nucleation sites in a protein
(e~cept in one case) are seoarated by at least 20 residues along
the sequence. The size of each domain varies from 13 to 20
residues, many having over 15 residues, the optimum number
suggested by Uetlaufer (1). The positions of various nucleation
sites in each of the proteins are indicated in Fi~. 1. Interes-
tingly, in many of the proteins, a few residues belonging to one
nucleation domain also become members of another domain thereby
interconnecting them; this feature indicates the possibility of
1588
VOI. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
existing a long hydrophobic core or channel across the protein
matrix; such channels noted in the presently studied proteins
are marked by dashed lines in Fig. 1.
Invariably, the domains are well buried and dens!y packed
in the protein matrix, the representative site residues having
nearly zero accessibilities. By their nature, the polar-nucleating
site residues retain small contact areas for solvent interaction.
The nucleation site residues acquire surrounding hydrophobicities
(hydrcphobic bonding energy, so to say) around 20 to 29 kcal/mole
and Van der Waals stabilization around 8 to 23 kcal/mole. Interes-
tingly, the peptide units that acauire some amount of nucleating
property from short- and medium-range interactions alone become
parts of seguents that are able to receive the best stability
from Van der Waals interactions, and consequently, fall within the
nucleation domains predicted from considerations of total interac-
tions. The most interesting observation is that, although alpha-
helical structure is also associated with the nucleating domains
(sites) the p-strand is the dominating secondary structure in
them. This result is in accordance with our earlier prediction (5)
tl~t the p-sheets are the most buried parts in globular proteins.
These characteristic results on nucleation domains/sites
provide support to the following important concept in the problem
of protein folding: 'the short- and medium-range interactions from
the primary sequence provide the minimum force to create nucleation
sites, and the long-range Van der Waals/hydrophobic interactions
provide the environment for their growth and aggregation, and
stability against disturbing external forces and conformational
fluctuations'.
1589
Vol. 97, No. 4, 1980 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
REFERENCES
1. Wetlaufer, D.B. (1973) Proc. Natl. Acad. Sci. USA 70,
697-701.
2. Matheson, R.R. and Scheraga, H.A. (1978) Macromolecules,
ll, 819-8 29.
3. Kanehisa, M.I. and Tsong, T.Y. (1979) Biopolymers, 18
1375-78, 2913-2928.
4. Rose, G.D. and. Roy, S. (1980) Proco iffatl. Acad.. Jci. USA
77, 4643-4647.
5. Ponnuswamy, P.K., Prabhakaran, M. and Manawalan, P. (1980)
Biochim. Biophys. Acta 62_~3, 301-316.
6. Tanford, C. (1963) J. Am. Chem. Joc. 84, 4240-4247.
7. Jones, D.D. (1975) J. Theor. Biol. 50, 167-183.
8. Manavalan, P. and. Po~_nuswaray, P.K. (1978) Nature 27~,
673-674.
9. Prabhakaran, M. and. Ponnuswauy, P.K. (1980)
(In press).
10.
J. Theor. Biol.
Lee, B. and Richards, Folio (1972) J..~Io!. Biol. 55,
1590