16
301 Biochimica et Biophysica Acta, 623 (1980) 301--316 © Elsevier/North-Holland Biomedical Press BBA 38444 HYDROPHOBIC PACKING AND SPATIAL ARRANGEMENT OF AMINO ACID RESIDUES IN GLOBULAR PROTEINS P.K. PONNUSWAMY, M. PRABHAKARAN and P. MANAVALAN Department of Physics, Autonomous Postgraduate Centre, University of Madras, Tiruchirapalli 620 020, Tamil Nadu (India) (Received October 16th, 1979) Key words: Hydrophobic environment; Protein folding; Globular protein; Crystallography Summary Amino acid residues acquire characteristic hydrophobic environments in globular proteins. Using the crystal data on 21 proteins, a new scale of hydro- phobic indices for the residues is set up. This scale provides valuable informa- tion with regard to hydrophobic domains, nucleation sites, surface domains, loop sites and the spatial positions of residues in protein molecules. Introduction In understanding the three<limensional structure of a globular protein mole- cule, information on four aspects is essential: (1) the possible disulphide link- ages; (2) the ordered secondary structural regions along the polypeptide chain; (3) the relative orientation of the different secondary structural segments and (4) the preferred environments and the packing arrangements (spatial positions) of individual residues. Theoretical efforts [1--5] have so far been mainly directed to characterise and predict the secondary structures. Reports have recently appeared on the possibility of the prediction of disulphide linkages [6], packing of secondary structural segments [7,8] and on the buried/exposed character of residues from solvent accessibility studies [9,10]. However, the present knowledge on these four levels of protein structure is very limited and the theoretical interpretation of the three-dimensional structure of protein mole- cules still remains at an infant stage. The three<limensional structures of globu- lar protein molecules as observed in crystalline state indicate that the folding of the chain occurs in such a way that some of the residues are buried in the interior and some are on the surface, irrespective of their sequential positions along the polypeptide chain. It is obvious, in this context, that the nonpolar residues are mostly to be expected in the interior and the polar residues on the

Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

Embed Size (px)

Citation preview

Page 1: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

301

Biochimica et Biophysica Acta, 623 (1980) 301--316 © Elsevier/North-Holland Biomedical Press

BBA 38444

HYDROPHOBIC PACKING AND SPATIAL ARRANGEMENT OF AMINO ACID RESIDUES IN GLOBULAR PROTEINS

P.K. PONNUSWAMY, M. PRABHAKARAN and P. MANAVALAN

Department of Physics, Autonomous Postgraduate Centre, University of Madras, Tiruchirapalli 620 020, Tamil Nadu (India)

(Received October 16th, 1979)

Key words: Hydrophobic environment; Protein folding; Globular protein; Crystallography

Summary

Amino acid residues acquire characteristic hydrophobic environments in globular proteins. Using the crystal data on 21 proteins, a new scale of hydro- phobic indices for the residues is set up. This scale provides valuable informa- tion with regard to hydrophobic domains, nucleation sites, surface domains, loop sites and the spatial positions of residues in protein molecules.

Introduction

In understanding the three<limensional structure of a globular protein mole- cule, information on four aspects is essential: (1) the possible disulphide link- ages; (2) the ordered secondary structural regions along the polypeptide chain; (3) the relative orientation of the different secondary structural segments and (4) the preferred environments and the packing arrangements (spatial positions) of individual residues. Theoretical efforts [1--5] have so far been mainly directed to characterise and predict the secondary structures. Reports have recently appeared on the possibility of the prediction of disulphide linkages [6], packing of secondary structural segments [7,8] and on the buried/exposed character of residues from solvent accessibility studies [9,10]. However, the present knowledge on these four levels of protein structure is very limited and the theoretical interpretation of the three-dimensional structure of protein mole- cules still remains at an infant stage. The three<limensional structures of globu- lar protein molecules as observed in crystalline state indicate that the folding of the chain occurs in such a way that some of the residues are buried in the interior and some are on the surface, irrespective of their sequential positions along the polypeptide chain. It is obvious, in this context, that the nonpolar residues are mostly to be expected in the interior and the polar residues on the

Page 2: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

302

surface. However, exceptions to the general interior-exterior hypothesis and the relationship between the extent of buried/exposed behaviour and the nature of the residue are not fully understood to a level of confident interpretation.

One of the best parameters that could be exploited in the studies on protein structure is the hydrophobic index of amino acids [11,12]. This index measures the hydrophobic behaviour of the amino acid in the organic surround- ing medium used in the thermodynamic experiments described by Tanford [11]. Rose [13] utilized this index to predict chain turns and hydrophobic cores in globular proteins.

However, this parameter is noted to have a very poor correlation with the extent to which the residue is buried in the native folded-protein matrix [9]. This observation highlights the fact that the intrinsic hydrophobic indices of the amino acids as such, are of limited use in the study of the protein mole- cules. Although the hydrophobic index is a measure of the preference of non- polar environment by a residue, it does not reflect to the same extent the environment in protein crystals. This is due to the difference in the environ- ment within the protein molecule and in the nonpolar solvent used in deriving the parameter. Thus, another measure is necessary to reflect realistically the preferred nonpolar environment of the residue in protein crystals. It has been recently pointed out in our laboratory [14] that each kind of residue likes to have a characteristic environment in protein crystals by associating itself with a set of specific surrounding residues and the requirement of this characteristic environment for each of the residues in the protein molecule drives the linear chain to fold into the compact globular shape. We also reported [15] the results of our analysis of the characteristic environment associated with each of the residues in a number of protein crystals and have defined a parameter, called 'the surrounding hydrophobicity', for the residues. This set of modified hydrophobic indices for the residues is found to correlate with the buried/ exposed behaviour of the residues in the protein matrix in a much more signi- ficant way.

In this paper we extend our studies on the surrounding hydrophobicity of residues in protein crystals to define hydrophobic domains, nucleation sites, loop sections, characteristic directionality of the chain (segments) and the depth of the residues from the surface. The phenomenon of the protein folding is interpreted in terms of the enrichment of the hydrophobic environment and an attempt is made to assign spatial positions for the residues from the centroid of the molecule, using statistical parameters.

To compute the hydrophobic environment around a given residue in the protein

The crystallographic data on 21 protein molecules form the source for our study. The proteins are myoglobin, ribonuclease S, cytochrome c, lysozyme, staphylococcal nuclease, carboxypeptidase A, subtilisin BPN, a-chymotrypsin, carp myogen, cytochrome bs, apolactate dehydrogenase, pancreatic trypsin inhibitor, thermolysin, oxidised chromatium iron protein, ferrodoxin, rubre- doxin, porcine tosyl elastase, dismutase, cytochrome c2, concanavalin A and flavodoxin (Refer to Levitt and Greet [16] for the original references for these proteins). These 21 proteins include representative examples of the known

Page 3: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

303

characteristic (a,/3, a +/~, a/H) structural types defined by Levitt and Chothia [17].

Each residue in the protein molecule is represented by its a-carbon atom (refer to Crampin et al. [18] for the limitations of this approximation). The center is fixed at the a-carbon atom of the first (N-terminal) residue and the distances between this a tom and the rest of the a-carbon atoms in the protein molecule are computed. The composi t ion of the surrounding residues associ- ated with this residue is calculated for a sphere of a predetermined size, which has been shown to be the required volume of the medium within which a residue in a protein molecule is noted to exert detectable influence. The sur- rounding residues are assigned with their respective hydrophobic indices as given by Tanford [11] or Jones [12]. The 'surrounding hydrophobic i ty ' of a residue is then taken to be the sum of the hydrophobic indices of those resi- dues that come within the assumed sphere. The same procedure is repeated each time by moving the center to the successive a-carbon atom along the poly- peptide sequence to complete the computa t ion of the surrounding hydropho- bicities for all the residues in the protein molecule. Thus, the surrounding hydro- phobici ty of the j th residue in a given protein molecule when it is in the native folded form, is given by

20

/-/]/ = ~ ni j • hi (1) i = l

where ni. j is the total number of surrounding residues of type i around the ]th residue of the protein, and hi is the h y d r o p h o b i c i n d e x of the ith residue. The superscript f indicates that the parameter of the residue is associated with the folded state of the protein molecule. The hi values for the 20 amino acids are given in the first column of Table I.

To compute the gain in the surrounding hydrophobicity o f a residue The difference in the surrounding hydrophobici t ies of a residue when the

molecule is in the native folded state and when it is in a hypothet ical unfolded state represents the enrichment in the hydrophobic proper ty of the residue due to the one-to-three-dimensional/conformational transition of the molecule. To compute this gain in the surrounding hydrophobic i ty of each of the residues in the protein molecule, it is assumed that the fully extended chain conforma- tion is the unfolded reference state. The contr ibution to the surrounding hydrophobic i ty for a residue when the molecule is in its reference state is taken, as an approximation, to be due to the two near neighbour residues on either side of the residue in question along the sequence (these two residues alone can come within the diameter limit of the assumed sphere). Thus, the surrounding hydrophobic i ty of the j th residue in the unfolded state of the pro- tein is given as

k= j+2

= (2) k=j- -2

Page 4: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

co

o

TA

BL

E I

AM

INO

AC

ID P

AR

AM

ET

ER

S

h an

d

H

are

in k

cals

; th

e re

st a

re d

imen

sio

nle

ss n

um

ber

s,

h,

hy

dro

ph

ob

ic

ind

ex;

H f

, h

yct

rop

ho

bie

ity

in

fo

lded

fo

rm;/

-/u

h

yct

rop

bo

bic

ity

in

un

fold

ed

form

; G

h,

hy

dro

ph

ob

icit

y

gai

n;

P,

po

lari

ty;

Br,

ex

ten

t to

wh

ich

th

e re

sid

ue

is b

uri

ed;

Ra,

acc

essi

bil

ity

red

uct

ion

ra

tio

; N

s, a

ver

age

nu

mb

er

of

surr

ou

nd

ing

res

idu

es.

Am

ino

P

aram

eter

s ac

id

h <H

f )

<H f-

-H u )

<Gh)

(H

0L )

(Hfl)

(H B )

(P)

(Br)

(Ra)

<Ns)

Ala

0

.87

1

2.2

8

7.6

2

2.6

3

13

.65

1

4.6

0

10

.67

0

.00

0

.38

3

.70

6

.05

A

sp

0.6

6

10

.97

6

.18

2

.29

1

0.9

8

13

.78

1

0.2

1

40

.70

0

.15

2

.60

4

.95

C

ys

1.5

2

14

.93

1

0.9

3

3.3

6

14

.49

1

5.9

0

14

.15

1

.48

0

.50

3

.03

7

.86

G

lu

0.6

7

11

.19

6

.38

2

.31

1

2.5

5

13

.59

1

1.7

1

49

.91

0

.18

3

.30

5

.10

P

he

2.8

7

13

.43

8

.99

3

.02

1

4.0

8

14

.18

1

3.2

7

0.3

5

0.5

0

6.6

0

6.6

2

GIy

0

.10

1

2.0

1

7.3

1

2.5

5

15

.36

1

4.1

8

10

.95

0

.00

0

.36

3

.13

6

.16

H

is

0.8

0

12

.84

7

.85

2

.57

1

1.5

9

15

.35

1

2.0

7

51

.60

0

.17

3

.57

5

.80

Il

e 3

.15

1

4.7

7

9.9

9

3.0

8

14

.63

1

4.1

0

12

.95

0

.15

0

.60

7

.69

7

.51

L

ys

1.6

4

10

.80

5

.72

2

.12

1

1.9

6

13

.28

9

.93

4

9.5

0

0.0

3

1.7

9

4.8

8

Leu

2

.17

1

4.1

0

9.3

7

2.9

8

14

.01

1

6.4

9

13

.07

0

.13

0

.45

5

.88

7

.37

M

et

1.6

7

14

.33

9

.83

3

.18

1

3.4

0

16

.23

1

5.0

0

1.4

3

0.4

0

5.2

1

6.3

9

Ash

0

.09

1

1.0

0

6.1

7

2.2

7

12

.24

1

1.7

9

10

.85

3

.38

0

.12

2

.12

5

.04

P

ro

2.7

7

11

.19

6

.64

2

.46

1

1.5

1

14

.10

1

0.6

2

1.5

8

0.1

8

2.1

2

5.6

5

Glu

0

.00

1

1.2

8

6.6

7

2.4

5

11

.30

1

2.0

2

11

.71

3

.53

0

.07

2

.70

5

.45

A

rg

0.8

5

11

.49

6

.81

2

.45

1

1.2

8

13

.24

1

1.0

5

52

.00

0

.01

2

.53

5

.70

S

et

0.0

7

11

.26

6

.93

2

.60

1

1.2

6

13

.36

1

1.1

8

1.6

7

0.2

2

2.4

3

5.5

3

Th

r 0

.07

1

1.6

5

7.0

8

2.5

5

13

.00

1

4.5

0

10

.53

1

.66

0

.23

2

.60

5

.81

V

al

1.8

7

15

.07

1

0.3

8

3.2

1

12

.88

1

6.3

0

13

.86

0

.13

0

.54

7

.14

7

.62

T

rp

3.7

7

12

.95

8

.41

2

.85

1

2.0

6

13

.90

1

1.4

1

2.1

0

0.2

7

6.2

5

6.9

8

Ty

r 2

.67

1

3.2

9

8.5

3

2.7

9

12

.64

1

4.7

6

11

.52

1

.61

0

.15

3

,03

6

.73

Page 5: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

305

The gain in the surrounding hydrophobicity for the jth residue in the protein is given by

H i =H]--I-l~i (3)

and the hydrophobic gain ratio for the residue is given by

V i =H~/H~i (4)

where superscript u denotes unfolded. The total gain in the surrounding hydrophobicities of all the residues in a pro- tein molecule and the corresponding gain ratio for the protein molecule as a whole, when it goes from the hypothetical extended state to the native folded state are, respectively, given by

N N

H p : 2 H ; - - 2 I - I ~ j (5) i = 1 i = 1

and

Gp = ~ ~ (6) j = l

where N is the number of residues in the protein molecule. A set of correspond- ing average parameters for the 20 types of residue were also computed taking all the 21 protein molecules in the sample. The average surrounding hydropho- bicity of residue type i is given by

= (7)

where n is the total number of occurrences of the ith type of residue in the 21 protein molecules. Similarly, we define the average gain in the surrounding hydrophobicity and the average gain ratio for the ith type residue, respectively, a s

<G,> = </~ --/-/~i > (8)

and

<GT> = (9)

The concept of surrounding hydrophobicity of amino acid residues When the protein molecule is in the unfolded state devoid of any long-range

interactions between the amino acid residues, each residue is surrounded by aqueous medium and it experiences only the influence of the near neighbour members along the sequence. When a transition takes place from this unfolded state to the native folded state, the solvent molecules surrounding each residue are replaced by the distant residues, thereby increasing the hydrophobic experi- ence of the residue. In this process, each residue in the protein molecule acquires characteristic hy_drophobic environment. Recently [14] we demon- strated that a sphere of 8/~ radius has the required volume, for any one residue

Page 6: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

306

of a protein molecule, for a study of the influence of this 'protein environment'. We defined 'the surrounding hydrophobicity' of a residue as the sum of the hydrophobic indices assigned to the various residues that appear within an 8 / ~ radius volume in the protein crystal [ 15 ].

In order to assess how much the surrounding hydrophobicity is enriched for each of the residues due to folding, the hydrophobicity values in the unfolded state (/-/~), and in the folded state (H ~) were computed by the use of Eqns. 1 and 2 for each of the 21 protein molecules and the values for the protein pan- creatic trypsin inhibitor are given in Table II. It is noted from this Table that the gain in the surrounding hydrophobicity of each of the various residues is in different measure, which indicates that the parameter 'surrounding hydropho- bicity' could be fruitfully used to describe the internal packing arrangements of residues in globular proteins.

Using the H ~ and H u values for the residues in the 21 protein molecules the total gain in the surrounding hydrophobicity and the gain ratio due to folding in each of these proteins were computed by the use of Eqns. 5 and 6. It was noted that the gain ratio for the individual proteins varies from a value of 2.11--3.02 and the average gain ratio for all the proteins is 2.62. In general, the heme proteins have lower gain ratios and the ~/fl-type proteins higher gain ratios than the total average value. The more-or-less similar hydrophobicity gain due .to folding in the different structural types of proteins indicates that this bulk property is an intrinsic factor of globular proteins.

T A B L E II

H Y D R O P H O B I C I T I E S O F S U R R O U N D I N G S O F A M I N O A C I D R E S I D U E S IN P A N C R E A T I C T R Y P - SIN I N H I B I T O R

H u , h y d r o p h o b i c i t y in u n f o l d e d form;/- / f , h y d r o p h o b i c i t y in fo lded form.

Residue Un- F o l d e d Residue Un- Fo lded Res idue Un- Fo lded fo lded state fo lded state fo lded state state state state ( H u ) (Hf) (HU) (Hf) ( H u ) (Hf )

1 A r g 3 .43 6 . 0 8 21 T y r 9 . 5 4 2 1 . 1 4 41 L y s 2 . 6 6 8 . 1 8 2 P r o 4 . 3 8 5 .99 2 2 Phe 6 .28 2 3 . 2 3 4 2 A r g 2 . 6 9 9 . 9 4 3 A s p 8 .01 1 0 . 8 4 2 3 T y r 6 . 5 0 1 7 . 3 5 4 3 A s n 5 .43 2 7 . 9 4 4 Phe 7 . 1 2 8 . 7 2 2 4 A s n 8 . 0 5 1 9 . 1 6 4 4 A s h 5 .45 2 1 . 3 0 5 Cys 6 . 2 7 1 7 . 5 2 2 5 A l a 5 .47 1 1 . 3 2 4 5 Phe 1 . 8 9 1 1 . 6 0 6 L e u 7 .83 1 2 . 2 0 2 6 L y s 1 . 9 3 1 . 9 3 4 6 L y s 3 . 9 0 1 2 . 1 7 7 G l u 9 . 2 3 1 2 . 8 4 2 7 A l a 4 . 7 8 6 . 3 8 4 7 Ser 6 . 0 5 1 1 . 1 7 8 P ro 8 .28 1 1 . 3 2 28 GIy 6 .32 1 0 . 7 9 4 8 A l a 3 . 0 4 1 5 . 5 8 9 P r o 5 .58 1 3 . 7 3 2 9 L e u 2 .61 7 . 9 8 4 9 G l u 3 . 1 2 4 . 2 8

1 0 T y r 5 .71 1 0 . 9 7 3 0 Cys 2 . 3 4 1 5 . 5 6 50 A s n 4 . 7 3 9 . 4 4 11 T h r 8 ,31 1 8 . 3 0 31 Gin 6 . 7 5 1 5 . 7 9 51 C y s 2 . 8 3 1 8 . 0 5 1 2 G l y 7 . 0 3 1 4 . 7 7 3 2 T h r 6 . 3 8 1 5 . 7 9 52 Met 3 , 1 0 9 . 7 7 1 3 P r o 3 . 3 3 5 .79 3 3 P h e 4 .61 1 6 . 9 8 53 Set 4 . 9 0 6 . 3 3 1 4 Cys 5 . 3 8 7 . 1 6 3 4 Val 5 .71 1 3 . 7 7 54 T h r 4 . 2 6 9 . 9 6 1 5 L y s 6 .01 6 . 3 0 3 5 T y r 4 . 9 4 1 9 . 0 3 5 5 Cys 1 . 1 2 1 8 . 2 4 1 6 A l a 7 . 1 6 I 0 . 0 0 3 6 GIy 6 . 1 6 1 7 . 1 2 56 GIy 2 .68 1 5 . 1 7 17 A r g 8 .71 1 3 . 5 4 3 7 GIy 5 . 1 4 1 4 . 2 0 57 GIy 2 .61 7 .24 1 8 Ile 5 .72 1 0 . 4 5 3 8 Cys 1 . 9 2 8 . 9 0 5 8 A l a 0 . 2 0 1 . 1 4 19 Ile 7 . 5 2 1 6 . 6 3 3 9 A r g 4 . 1 3 7 . 0 0 2 0 A r g 1 1 . 8 4 2 3 . 9 8 4 0 A l a 4 . 8 8 1 0 . 5 5

Page 7: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

307

Distribution of amino acid residues in different ranges of surrounding hydro- phobicity

A study of the distribution of amino acid residues in various ranges of sur- rounding hydrophobici ty throws more light on the packing arrangement of residues in protein crystals, The distribution is represented in terms of histo- grams displayed in Fig. 1. These histograms were constructed by the use of fractional frequencies obtained by dividing the frequencies of each type of resi- due lying in a specific range of surrounding hydrophobicity by their respective total occurrence. The histograms indicate clearly the degree of preference by each of the residues in the different hydrophobic ranges. The 0--3 kcal range contains only a very insignificant number of residues. The residues Asp, Gly, Lys, Asn, Gln, Tyr and Thr have higher occupation in the lower (3--12 kcal) hydrophobici ty range. The residues Cys, Phe, Ile, Leu, Met, Val and Trp have higher occupation in the higher (12--21 kcal) hydrophobicity range. Ala, His, Pro, Asn and Arg have higher occupation in the middle (9--15 kcal) range. The residue Tyr has higher occurrence in the ranges 9--12 and 15--18 kcal with a drop in the 12--15 kcal range. It is very interesting to note that the average value of the surrounding hydrophobicities for all the residues in all the proteins

0 6 12 B 0 6 12 18

SURROUNDING HYDROPHOBICITY RANGE { K COIS )

VAL

THR

O 6 12 18

Fig. 1 . H i s t o g r a m s s h o w i n g t h e d i s t r i b u t i o n o f t h e 2 0 a m i n o ac id r e s idues in 21 P r o t e i n s in va r i ous ranges of surrounding hydrophobic i ty .

Page 8: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

308

lies around 12 kcal, which could be taken as the demarkation point between the higher and lower surrounding hydrophobicities, for purposes of discussion.

Surrounding hydrophobicities in a-helix, {J-sheet and fl-turn segments In order to study the hydrophobic packing arrangement of residues in the

different secondary structural segments in protein molecules, a set of average surrounding hydrophobicity values for the 20 kinds of residue was computed by considering the a-helical, fl-sheet and ~-turn segments in the 21 protein molecules. The average H ~ values for the residues occurring in these respective structural segments are given in Table I as (/_/u), ff_/~) and (/4 B) parameters. A judgement of these parameters shows that, except for the residues Gly and Asn, all other residues acquire consistently higher values of surrounding hydro- phobicities in /~-sheet segments than in a-helix and ~-turn portions. This fact indicates that the ~-sheets are more buried than a-helix and ~-turn segments in globular proteins [19]. Another notable observation is that the nonpolar resi- dues Met and Val acquire high surrounding hydrophobicities even in f~-turns, indicating that bends associated with these residues could have often been buried inside the protein matrix.

'Average surrounding hydrophobicity" or the 'bulk hydrophobic character' of amino acid residues

The native state hydrophobic environment represented by the parameter /If, and the gain in the surrounding hydrophobicity or the hydrophobic enrich- ment, due to folding for each of the residues in all the 21 proteins were used to obtain a set of general average parameters (/-/~) and (H ~ -- /4 ~) for the 20 kinds of residue and these values are also listed in Table I. The (/-/~) values differ slightly from the corresponding (/4) values reported earlier [15] from an anal- ysis of 14 protein molecules, but when transformed to a common origin, the relative values within each set remain unaltered. The two complementary average parameters, namely, the average surrounding hydrophobicity and the average gain in the surrounding hydrophobicity, measure the preference of non- polar or hydrophobic environment in protein crystals and reflect the intrinsic packing arrangement of residues around each of the 20 kinds of amino acid residue. We have referred the average surrounding hydrophobicity of a residue given by (H ~) as its 'bulk hydrophobic character'. This bulk parameter quanti- tatively measures the effective hydrophobic bonding the residue can make in the protein environment.

A careful study of the intrinsic (Tanford) [11] hydrophobicity indices, the bulk hydrophobic indices and the gains in the surrounding hydrophobicity or the hydrophobic gain ratios of the 20 types of amino acid residue provides valuable information as to the characteristic packing behaviour of each of these residues in globular proteins. It is noted that the bulk hydrophobicity is not proportional to the corresponding hydrophobic (Tanford) index of the residue: the low values of (/-fl) for the residues Pro, Trp and Tyr clearly indicate that the hydrophobic bonding behaviour of these residues in protein molecules is not adequately reflected by their hydrophobic (Tanford) indices; the residue Val with an hydrophobic index almost half that of Trp, has the highest ff-/~) value and the charged residues Asp and Lys are associated with the lowest ~ values.

Page 9: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

309

These observations suggest that apart from the nonpolari ty indicated by the hy- drophobic indices of amino acid side-chains, both their structure and the pres- ence of some polar atoms strongly influence their bulk hydrophobic character and hence their spatial position in protein molecules.

In general, the number of residues surrounding a residue within the effective distance of influence is the important factor for the change in the surrounding hydrophobic i ty indices of the residues. The average number of surrounding residues (LNs)) for each of the 20 amino acid residues as noted in the entire pro- tein sample are given in the last column of Table I.

Solvent accessibility vs. hydrophobicity An interesting aspect of the present s tudy on surrounding hydrophobic i ty is

that the results could be compared with the results of solvent accessibility studies made on protein molecules, by Lee and Richards [20] and by Chothia [9]. These authors calculated the decrease in solvent accessibility for an amino acid residue when the protein molecule moves from a hypothet ical ly extended state to the native folded state. We have computed here the gain in the surrounding hydrophobic i ty for an amino acid residue when the protein mole- cule is transferred from a similar extended state to the native folded state, and hence the reduct ion in the sblvent accessibility and the gain in the surrounding hydrophobic i ty have an inverse relationship. In Table I the solvent accessibility reduction ratios LRa) , and the surrounding hydrophobic i ty gain ratios (Gb), for the 20 amino acid residues are given for comparison. Chothia [9], by analysing the solvent accessibilities of the 20 types of residue in 12 proteins, has also given the probable extents to which the residues are buried in globular protein molecules. This parameter Br is also given in Table I. The relationship between the (/-/~) values and the corresponding B~ values is noted to have a correlation coefficient o f 0.82, which is a far bet ter value than the correlation coefficient (0.4) obtained between the parameters B~ and the individual (Tanford) hydro- phobic index. The least-square line connecting B~ and the average surrounding hydrophobic i ty is given by

Br = 0.09773 (/-f> -- 0 .94524 (10)

with a standard error o f 0.103. Fig. 2 is a plot of the B~ of each type of residue against its average surrounding hydrophobic i ty and the central line is the plot of the above theoretical least-squares line equation. It is seen from this figure that 11 out o f the 20 residues are very close to the theoretical line. The resi- dues Gly, Ala, Phe and Ile are noted to be buried more in the interior and the residues Lys, Gin, Arg, His and Tyr are noted to be more on the surface than predicted from the values of surrounding hydrophobici ty .

The Br value of a residue could be bet ter explained when we couple the molecular weight (Mw) and polarity (P) of the residue with its surrounding hydrophobic i ty . Adopt ing the polarity index values for the residues from Zimmerman et al. [21] a polynomial equation was fi t ted and the multiple cor- relation computed . The equation of polynomial is

B r = 0.09681 <H~ -- 0.00061 P -- 0 .00197 Mw -- 0.6588 (11)

Page 10: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

310

0 . 6

0 5

0,4

0.3

0.2

VAL

F ~ QCYS

ALA MET

6b Y o

TRR

L¥$ 0 GLN

, C, ~ARG, = i I tO II 12 15 14 15

AVERAGE SURROUNDING HYDROPHOBICtTY ~ Hf~ >

Fig. 2. P lo t connecting the average s u r r o u n d i n g h y d r o p h o b i c i t y a n d the e x t e n t t o w h i c h t h e a m i n o acid r e s idues are b u r i e d (Br) in g l o b u l a r proteins. The l ine is the least-squares fit for the t w o var iab les .

and the computed multiple correlation coefficient is 0.90 with estimated S.E. 0.08. All the bulkier residues now show only slight deviations from the esti- mated values of the polynomial equation. It should be borne in mind that the protein data used to compute the surrounding hydrophobic i ty form an extended set compared to the limited set used in determining t he buried character of the residues.

Surrounding hydrophobici ty and the spatial position o f amino acid residues The relation between the hydrophobic environment of the residues and their

expected spatial positions in the protein matrix can be seen in Fig. 3, where we have marked the observed depths of the amino acid residues from the surface of the protein molecule, pancreatic trypsin inhibitor, along with the respective surrounding hydrophobic i ty values. In calculating the observed depth of a resi- due from the surface of the protein, the protein molecule was assumed to be an ellipsoidal figure in which the depth of the residue from the surface is given by

D = 1 - - (x2/a 2 + y2/b2 + z2/c 2) (12)

where x, y and z are the Cartesian coordinates of the residue with respect to the centroid, and a, b and c are the semi-axes of the ellipsoidal volume of the protein. According to this representation, a residue at the surface will have a value of D = 0, and the one at the centroid of the protein a value o f D = 1. It is seen from the depth/surrounding hydrophobic i ty plot that there is a high cor- relation between the surrounding hydrophobic i ty of the residues and their res- pective positions in the protein globule. However, disagreements at a few places

Page 11: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

3 1 1

d 0

27

~9

PANCREATIC TRYPSIN INHIBITOR

2O

A /~, h'~ /', /\

~ r " ! , ~ I I 3o ~w k

~/ ~\ I',r v:l ~' I /X ~ !,

X,d} , ' ff" v ~ II a/ f',l I ~

;o /i ,', I/, i" ^ : ~ ~'/', Fi t D~/ 'X, ;W ;! /; v i 'i / 1,I ~l

I/

I ' J ' ' ' I 0 0 30 40

I \ " ~ /~

I I

7o 1 o.!

AMINO ACID SEQUENCE NUtI~ER

Fig . 3. S u r r o u n d i n g h y d r o p h o b i c i t y (o o) a n d d e p t h (o . . . . . . e ) p ro f i l e s o f a m i n o ac id r e s idues in pancreatic t r y p s i n i n h i b i t o r . The a r r o w i n d i c a t e s t h e average va lue o f the s u ~ o u n d i n g hydrophobic i t i e s o f all t h e r e s idues in the p r o t e i n .

are also noticed: the residues 3, 18 and 56 do not correlate. This may be due to the inherent fault of our assumption that the protein has an ellipsoidal shape and that residue depth is related solely to surrounding hydrophobicity.

The interesting feature noted in Fig. 3 is that the nonpolar residues (other than proline), Phe 4, Tyr 10, Cys 14, Ala 16, Ile 18, Ala 27, Leu 29, Cys 38, Ala 40, Phe 45, Met 52, and Ala 58 have lower surrounding hydrophobicity values, and the polar residues Glu 7, Thr 11, Arg 17, Arg 20, Asn 24, Gln 31, Thr 32, Asn 43 and Asn 44 are associated with higher surrounding hydropho- bicity values than the protein average. These specific observations clearly indi- cate that a number of nonpolar residues are nearer the surface and a number of polar residues are in the interior of protein molecule, contrary to the general belief that polar residues will be on the surface and nonpolar residues in the interior.

The observed spatial positions of amino acid residues in the protein crystals are often classified into three categories as: (1) external (E) when the side- chain of the residue is exposed to the surrounding solvent medium; (2) surface (S) when the side~haln is partially buried and (3) internal (I) when the side- chain is completely buried. Such classifications were done for the molecules lysozyme, cytochrome c, ~-chymotrypsin and carp myogen by the authors of the crystallographic studies. The surrounding hydrophobicity of the residues within a protein was divided into six ranges and the external/surface/internal kinds of residue were counted in each range for an analysis. The distribution pattern of these three groups of residues in different surrounding hydrophobi- city ranges in the above-mentioned four proteins is shown in Fig. 4. For con- structing this plot, the fractional frequency of the residue in each of these

Page 12: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

312

4'G

3 0 _o

o

2-0

p-

T ~'0

/2' /

/ / ¢,

/ /

/ /

/ /

/ /

!

i / ///

3 - 6 6 - 9 9-12 12-15 !5-[8 18-21

SURROUNDING HYDROPHOBICITY RANGE (Kcals)

Fig . 4 . T h e re la t ive p r e f e r e n c e o f a m i n o ac id r e s idues of ex t e rna l (E), su r face (S) a n d i n t e rna l (I) n a t u r e in va r ious r anges o f s u r r o u n d i n g h y d r o p h o b i c i t y .

three groups in each of the energy ranges was calculated and it was divided by the fractional frequency occurrence in the particular group. The figure shows that in the lower (0--9 kcal) range external residues are highly populated; in the 9--12 kcal range all three kinds of residue (E, S and I) are equally populated; the surface residues have highest population in t he 12--15 kcal range and their frequency of occurrence falls rapidly with the move towards still higher hydrophobic i ty values; the external residues also steadily decrease as the hydrophobici ty increases; on the other hand, the internal residue popula- tion increases almost vertically as the surrounding hydrophobic i ty increases. From the knowledge of the average surrounding hydrophobic i ty values, the residues were characterised into external, surface and internal nature: the resi- dues Ala, Asp, Glu, Gly, His, Lys, Asn, Pro, Gln, Arg, Set and Thr having their

values below 12.5 were assumed to belong to the external group, the resi- dues Trp and Tyr having ~ values between 13.4 and 12.5 to the surface group and the rest of the residues, Cys, Phe, Ile, Leu, Met and Val having (/-/~) values of at least 13.4, to the internal group. Janin et al. [10] classified the residues in to groups, viz. buried, intermediate and exposed, and our classification coin- cides with theirs, except for the residues Trp and Tyr: we assign these two resi- dues to the surface group whereas Janin et al. assign Trp to the buried group and Tyr to the exposed group.

Page 13: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

313

Surface and hydrophobic domains and directionality of chain segments from surrounding hydrophobicity

From the hydrophobicity plots another useful piece of information, namely, the directionality of the segments of the protein chain can be visualised. When the trend of the protein chain is to go from lower to higher surrounding hydro- phobicity, it may be stated that the chain is proceeding from the inside to the surface of the molecule. A surface loop (not necessarily the conventional H-bend) seems to occur about residues with very low surrounding hydrophobic- ity values. For objective purposes, the residues with surrounding hydropho- bicity values lower than half of the average value of the surrounding hydropho- bicities of all the residues in the protein are assumed to belong to surface do- mains: the residue having the lowest surrounding hydrophobicity value in each domain is taken to represent that domain. The residues having surrounding hydrophobicity values nearly equal to or greater than twice the value of the average value of the surrounding hydrophobicities of the residues in the protein are taken to form the hydrophobic domains and the residue of the highest surrounding hydrophobicity within a domain is taken to represent that domain. It was noted that many hydrophobic domains are often interlinked, thus form- ing hydrophobic channels within the protein molecule. In Table III the pre- dicted surface and hydrophobic domains in a few representative proteins are given. The interconnected domains are shown in this table by a group of resi- dues, each drawn from one such domain on the basis of the highest surrounding hydrophobicity within that domain. As hydrophobic clustering is conceived to be important in globular protein structure formation, we suggest that the resi- dues noted in Table III under the hydrophobic domain column should also represent the specific nucleation sites in the respective protein molecules. Matheson and Scheraga [22], from the knowledge of amino acid sequence, pro-

TABLE III

HYDROPHOBIC DOMAINS AND SURFACE DOMAINS IN PROTEIN MOLECULES DERIVED FROM SURROUNDING HYDROPHOBICITY INFORMATION

The s tructural t y p e is n o t e d in b r ac k e t near the n a m e o f the prote in . Each res idue represents a do ma in ; in ter l inked d o m a i n s are s h o w n b y a group ( t w o or m o r e ) o f res idues . Under l ined res idues c o r r e s p o n d to observed ~-turns.

Protein Centres o f h y d r o p h o b i c CenWes o f surface d o m a i n s / n u c l e a t i o n s i tes d o m a i n s / l o o p s i tes

M y o g l o b i n (a) ( I ) Leu 32--Ala I I 0 I,L53, 57, 8_.11, 9___66. 12--0, 125, 153

~-Chymotryps in (~) (1) Leu 99 13, 2_99, 53, 68, 72,86 (2) Ser 112--Vai 114- - 90, 107, 120 ,140 ,

Thr 199--Vai 200 157, 178, 19.___55, 209, 228, 234

L y s o z y m e (a +/3) (1) Ala 9 14, 222, 44, 4._88, 6_~7 (2) Va129 7__?, 116, 129 (3) TI~ 40 (4) Cys 64

C a r b o x y p e p t i d a s e A (1) Va149 2, 7, 31,433, 5__77, 9__!1, (~1~) (2) Ile 62--Leu 107-- 122, 135, 1544, 1 6 9 ,

Leu 193--Tyr 204- - 21._.44, 218, 237, 276, 307 Set 166

Page 14: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

3 1 4

posed nucleation sites in a few proteins. Myoglobin, lysozyme and carboxy- peptidase-A included in Table III were also considered by these authors. The nucleation sites around Ala 110 in myoglobin and Cys 64 in lysozyme identi- fied by us are also predicted by Matheson and Scheraga [22]. A scrutiny of the structure of carboxypeptidase-A shows that the nucleation site around Tyr 204 is in the vicinity of the nucleation site 279--289 predicted by these authors. The main difference between our method and that of Matheson and Scheraga is that while we identify all the existing nucleation sites in a protein crystal, Matheson and Scheraga predict only one site that could persist or vanish after the formation of the native structure.

The probable distances of the residues from the centroid, the directionality of the chain segments, the hydrophobic domains, and the loop sections all derivable from the surrounding hydrophobicity of the residues could be more easily visualised from a circular plot of surrounding hydrophobicity of the resi- dues in a protein molecule. The calculated surrounding hydrophobicities of the residues in pancreatic trypsin inhibitor are shown in such a circular plot in Fig. 5. For constructing this circular plot, the circumference of the circle and its radius have been chosen, respectively, as abscissa and ordinate. The differ-

o

Fig. 5. Circular p lo t showing the d i s t r ibu t ion of s u r round in g h y d r o p h o b i c i t y o f a m i n o acid res idues in pancreat ic t r y p s i n inh ib i tor . The radius o rd ina te represents the su r ro u n d in g h y d r o p h o b i c i t y scale (0 kcal at the o u t e r m o s t circle a nd 28.0 kcal a t the i n n e r m o s t circle). The scale is equal ly d iv ided in to six par ts and the th ick- th in line circle r ep resen t s the average h y d r o p h o b i c i t y for the prote in .

Page 15: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

315

ence between the radii of the outermost and the innermost circles corresponds to the length of the ordinate which is made equal to the maximum value of the surrounding hydrophobicity for a residue noted in the protein. The minimum value (0 kcal) starts from the outer-most circle and six concentric rings in be- tween those two extremes are drawn at equal intervals. The sequence numbers and the names of the residues are marked along the circumference and the sur- rounding hydrophobicity corresponding to each residue is plotted along the ordinate. Thus, the residues which have low surrounding hydrophobicity will be located away from the centre of the circle. This circular hydrophobicity pro- file demonstrates the variation of the hydrophobic environment as one proceeds from the N-terminal to the C-terminal end of the chain, bringing out conceptually spatial locations of the residues from the centroid, the interior residues which form hydrophobic domains and thereby the nucleation sites, the directionality of the segments, and the surface domains along the poly- peptide chain.

Conclusion

This article has dealt with the detailed study on the surrounding hydropho- bicity of amino acid residues in protein crystals, which provided many interest- ing results on the distribution of the surrounding environment around each of the residues in globular proteins. It is possible to identify to a fair extent the nucleation sites, the directionality of the segments and also the surface domain/ loop regions in protein molecules from the knowledge of the surrounding hydrophobic environment. The distribution of surrounding hydrophobicities for different kinds of residue shows that the residues could be classified into characteristic groups on the basis of their preferred hydrophobic environments. The bulk hydrophobic character of a residue as obtained in protein crystals apparently deviates from its intrinsic hydrophobic character defined by the Tanford index [11]. The results suggest that, apart from the nonpolarity indi- cated by the hydrophobic indices of the side-chains of the amino acid residues, their bulk and the presence of some polar atoms highly influence the preferred hydrophobic environment and hence the spatial position in the protein mole- cule. The folding of the protein chain is thus not equivalent to the transfer of the constituent residues from a hydrophobic environment to an organic environment. Each residue has a characteristic environment. The most interest- ing point that demands our attention in the problem of protein folding is the occurrence of a significant number of nonpolar residues on the surface of the protein matrix. Our present investigations aim at this aspect.

A table of predicted hydrophobic/surface domains and a set of circular hydrophobicity plots for other proteins mentioned in this article are available from the authors on request.

Acknowledgment

The work reported in this article was supported by a research grant to P.K.P. from the Department of Science and Technology, Government of India.

Page 16: Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins

316

References

1 Chou, P.Y. and Fasman, G.D. (1974) Biochemistry 13 ,224- -245 2 Buxgess, A.W., Ponnuswamy, P.K. and Scheraga, H.A. (1974) Isr. J. Chem. 12, 239--286 3 Nagano, K. (1977) J. Mol. Biol. 109 ,251 - -254 4 Gamier, J., Osguthorpe, D.J. and Robson, B. (1978) J. Mol. Biol. 120, 97--120 5 Sternberg, M.J.E. and Thornton, J.M. (1978) Nature 271, 15--20 6 Seshagiri, N. (1978) in Symposium on Biomoleeular Structure, Conformation, Funct ion and Evolut-

ion, University of Madras, India 7 Cohen, F.E., Richmond, T.J. and Richards, F.M. (1979) J. Mol. Biol. 132, 275--288 8 Lim, V.I. (1978) FEBS Lett . 89, 10--14 9 Chothia, C. (1976) J. Mol. Biol. 105, 1--14

10 Janin, J., Wodak, S., Levitt , M. and Maigret, B. (1978) J. Mol. Biol. 125, 357--386 11 Tanford, C. (1963) J. Am. Chem. Soc. 84, 4240--4247 12 Jones, D.D. (1975) J. Theor. Biol. 50, 167--183 13 Rose, G.D. (1978) Nature 272 ,586- -590 14 Manavalan, P. and Ponnuswamy, P.K. (1977) Arch. Biochem. Biophys. 184, 476--487 15 Manavalan, P. and Ponnuswamy, P.K. (1978) Nature 275, 673--674 16 Levitt , M. and Greet, J. (1977) J. Mol. Biol. 114 ,181- -239 17 Levitt , M. and Chothia, C. (1976) Nature 261 ,552- -558 18 Crampin, J., Nicholson, B.H. and Robson, B. (1978) Nature 272, 558--560 19 Prabhakasan, M. and Ponnuswamy, P.K. (1979) J. Theor. Biol. 80, 485--504 20 Lee, B. and Richaxds, F.M. (1971) J. Mol. Biol. 55 ,379- -400 21 Zimmerman, J.M., EHezar. N. and Simha, R. (1968) J. Theor. Biol. 21 ,170- -201 22 Matheson, R.R. and Scheraga, H.A. (1978) Macromolecules 11 ,819- -829