16
In(. J. Peplide Prorein Re.,. 42. IYY3. 326-341 Primed in BelRiuni - all ngho re.rerred Caovrirhl 0 Munksraaid IY93 1_ Y INTERNATIONAL JOURNAL OF PEPTIDE & PROTEIN RESEARCH ISSN 0367-8377 Prediction of transmembrane helices from hydrophobic characteristics of proteins P.K. PONNUSWAMY and M. MICHAEL GROMIHA Department of Physics, Bharathidasan Universiil~, Tiruchirapalli. Tamil Nudu, India Received 14 December 1992, accepted 17 March 1993 Membrane proteins. requiring to be embedded into the lipid bilayers, have evolved to have amino acid se- quences that will fold with a hydrophobic surface in contact with the alkane chains of the lipids and polar surface in contact with the aqueous phases on both sides of the membrane and the polar head groups of the lipids. It is generally assumed that the characteristics ofthe aqueous parts ofthe membrane proteins are similar to those of normal globular proteins, and the embedded parts are highly hydrophobic. In our earlier works, we introduced the concept of ‘surrounding hydrophobicity’ and developed a hydrophobicity scale for the 20 amino acid residues, and applied it successfully to the study of the family of globular proteins. In this work we use the concept of surrounding hydrophobicity to indicate quantitatively how the aqueous parts of membrane proteins compare with the normal globular proteins, and how rich the embedded parts are in their hydrophobic activity. We then develop a surrounding hydrophobicity scale applicable to membrane proteins, by mixing judicially the surrounding hydrophobicities observed in the crystals of the membrane protein, photosynthetic reaction center from the bacterium Rhodopse~idorizori~s viridis, porin from Rhodobacter cap.~u- latus and a set of 64 globular proteins. A predictive scheme based on this scale predicts from amino acid sequence, transmembrane segments in PRC and randomly selected 26 membrane proteins to 80% level of accuracy. This is a much higher predictive power when compared to the existing popular methods. A new procedure to measure the amphipathicity of sequence segments is proposed, and it is used to characterize the transmembrane parts of the sample membrane proteins. 0 Munksgaard 1993. Key words: hydrophobicity; membrane protein: structure prediction; transmembrane segmcnts Several hydrophobicity scales have been proposed for the study of globular proteins following the classical one by Tanford (1). A critical comparative study of performances of these scales (2, 3) shows that the ‘sur- rounding hydrophobicity scale’ developed by us (4, 5) is one of the best to deal with the soluble folded pro- teins. The major feature of this scale is that it adequately measures the preference of nonpolar environment (which is significantly different from the matrix of or- dinary organic solvents), by the 20 residues in a folded protein, which could not be adequately reflected by scales derived for free amino acids from thermody- namic transfer experiments with organic solvents. In a folded protein, each amino acid residue is surrounded by a characteristic cluster of other spatially separated residues which contribute to the hydrophobic activity of that residue (6). An important distinction between thc folding environment of soluble proteins and membrane proteins is the presence of relatively long stretches of hydrophobic/amphipathic helical or strand segments that traverse the membrane matrix, around which the 326 interaction of water is almost absent. This results in a much different role for hydrophobic interactions in sta- bilizing the structure of membrane proteins, and hence a comparative analysis of the structures of proteins in these two classes is central to understanding the role the solvent plays in protein folding. As the environment presents no hydrogen-bond donors or acceptors and as its dielectric constant is lower than that of a protein interior (7). hydrophobicity scales developed from the examination of soluble globular protein structures would be inappropriate for investigating amino acid parts ex- posed to lipid environment in membrane proteins. However, there have been many attempts to investigate and make predictions on the structural features of membrane proteins with the use of scales developed solely for globular proteins (8, 9). Simultaneously, hy- drophobicity scales specifically designed for membrane proteins have also been proposed (10-13). Recently, Esposti et 01. (14) carried out a statistical analysis with scveral scales to verify the relative accuracy in the pre- diction of the transmembrane scgments of polytopic

Prediction of transmembrane helices from hydrophobic characteristics of proteins

Embed Size (px)

Citation preview

Page 1: Prediction of transmembrane helices from hydrophobic characteristics of proteins

In(. J. Peplide Prorein Re.,. 42. IYY3. 326-341 Primed i n BelRiuni - all ngho re.rerred

Caovrirhl 0 Munksraaid IY93 1 _ Y

INTERNATIONAL JOURNAL OF PEPTIDE & PROTEIN RESEARCH

ISSN 0367-8377

Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. PONNUSWAMY and M. MICHAEL GROMIHA

Department of Physics, Bharathidasan Universiil~, Tiruchirapalli. Tamil Nudu, India

Received 14 December 1992, accepted 17 March 1993

Membrane proteins. requiring to be embedded into the lipid bilayers, have evolved to have amino acid se- quences that will fold with a hydrophobic surface in contact with the alkane chains of the lipids and polar surface in contact with the aqueous phases on both sides of the membrane and the polar head groups of the lipids. It is generally assumed that the characteristics ofthe aqueous parts ofthe membrane proteins are similar to those of normal globular proteins, and the embedded parts are highly hydrophobic. In our earlier works, we introduced the concept of ‘surrounding hydrophobicity’ and developed a hydrophobicity scale for the 20 amino acid residues, and applied it successfully to the study of the family of globular proteins. In this work we use the concept of surrounding hydrophobicity to indicate quantitatively how the aqueous parts of membrane proteins compare with the normal globular proteins, and how rich the embedded parts are in their hydrophobic activity. We then develop a surrounding hydrophobicity scale applicable to membrane proteins, by mixing judicially the surrounding hydrophobicities observed in the crystals of the membrane protein, photosynthetic reaction center from the bacterium Rhodopse~idorizori~s viridis, porin from Rhodobacter cap.~u- latus and a set of 64 globular proteins. A predictive scheme based on this scale predicts from amino acid sequence, transmembrane segments in PRC and randomly selected 26 membrane proteins to 80% level of accuracy. This is a much higher predictive power when compared to the existing popular methods. A new procedure to measure the amphipathicity of sequence segments is proposed, and it is used to characterize the transmembrane parts of the sample membrane proteins. 0 Munksgaard 1993.

Key words: hydrophobicity; membrane protein: structure prediction; transmembrane segmcnts

Several hydrophobicity scales have been proposed for the study of globular proteins following the classical one by Tanford (1). A critical comparative study of performances of these scales (2, 3) shows that the ‘sur- rounding hydrophobicity scale’ developed by us (4, 5 ) is one of the best to deal with the soluble folded pro- teins. The major feature of this scale is that it adequately measures the preference of nonpolar environment (which is significantly different from the matrix of or- dinary organic solvents), by the 20 residues in a folded protein, which could not be adequately reflected by scales derived for free amino acids from thermody- namic transfer experiments with organic solvents. In a folded protein, each amino acid residue is surrounded by a characteristic cluster of other spatially separated residues which contribute to the hydrophobic activity of that residue (6). An important distinction between thc folding environment of soluble proteins and membrane proteins is the presence of relatively long stretches of hydrophobic/amphipathic helical or strand segments that traverse the membrane matrix, around which the

326

interaction of water is almost absent. This results in a much different role for hydrophobic interactions in sta- bilizing the structure of membrane proteins, and hence a comparative analysis of the structures of proteins in these two classes is central to understanding the role the solvent plays in protein folding. As the environment presents n o hydrogen-bond donors or acceptors and as its dielectric constant is lower than that of a protein interior (7). hydrophobicity scales developed from the examination of soluble globular protein structures would be inappropriate for investigating amino acid parts ex- posed to lipid environment in membrane proteins. However, there have been many attempts to investigate and make predictions on the structural features of membrane proteins with the use of scales developed solely for globular proteins (8, 9). Simultaneously, hy- drophobicity scales specifically designed for membrane proteins have also been proposed (10-13). Recently, Esposti et 01. (14) carried out a statistical analysis with scveral scales to verify the relative accuracy in the pre- diction of the transmembrane scgments of polytopic

Page 2: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

Using the a-carbon coordinates, a volume of radius 8 A is fixed around each residue, and the residues occurring in this volume are identified. (The influence of each residue over the su;rounding medium extends effec- tively only up to 8 A; see refs. 4 and 6.) These sur- rounding residues around a given residue are assigned with their respective hydrophobic indices obtained from thermodynamic experiments, and the ‘surrounding hy- drophobicity’ of the central residue, H,, is taken to be the sum of these hydrophobic indices:

HI = w,)4 (1)

where n,,, is the total number of surrounding residues of type i around the jth residue of the protein, and h, is the hydrophobic index (16, 17) of residue type i. The com- putations have to be repeated, keeping each residue as the central one.

proteins. Until very recently there was only one data base of known membrane protein crystal structures, so these specified scales were forced to depend on many speculative propositions (such as the amino acid com- position of the transmembrane parts suggested just from the amino acid sequence), and hence their applications to structure predictive purposes could yield only results that were highly doubtful (8, 13, 15). As our concept of ‘surrounding hydrophobicity’ enabled us to develop a hydrophobicity scale for globular proteins which worked distinctly better than many of the existing scales in various aspects, we wanted to develop an analogous scale applicable to membrane proteins. This scale has been developed by judicially mixing the surrounding hydrophobicities calculated for the amino acid residues of the crystal structure of the membrane proteins, the photosynthetic reaction center and porin, and the sur- rounding hydrophobicities calculated for the residues in the crystal structures of 64 globular proteins. As this compounding hydrophobicity scale incorporates the in- credients of both the crystal features, as well as the sequences of globular and membrane proteins, it should be highly successful in indicating quantitatively how the aqueous parts of membrane proteins compare with the normal globular proteins, and how rich are the embed- ded parts in their hydrophobic activity. Based on this scale, we are able to propose a surrounding hydropho- bicity profile method to predict from sequence the trans- membrane helices and strands and their amphipathicity characters. The predictive power of this method is at a level of 8 1 % accuracy for transmembrane helices, which is far above those observable by the existing five well known methods. The predicted transmembrane strands for the N-terminal segment 1-177 of OmpA protein, porin (from R. cupsulutus and Escherichiu coli), and maltoporin indicate folds for these proteins which are consistent with results from Raman spectroscopy. In this article we describe the methodology of the devel- opment of the new scale, its application to the predic- tion of transmembrane helical segments and their am- phipathicities in a set of 26 membrane proteins comprising of transport proteins, membrane anchor proteins, and receptor proteins, the salient results, and a quantitative comparison with’ the results of similar studies made by other workers on the same proteins. In a future article we will deal with outer membrane pro- teins consisting of P-strands as their transmembrane elements.

MATERIAL AND METHODS

Computution uf surrounding hl’drophobicity The surrounding hydrophobicity for a residue was com- puted by the procedure proposed by us earlier (4, 5). The crystal structures of protein molecules form the basis for this computation. The residues in a protein molecule are represented by their a-carbon atoms.

Identifiation of amphipathic helices The amphipathic character of a-helices was determined by following the procedure proposed by us (1 8). In this procedure, the residues of an a-helical segment are con- sidered on four adjacent edges along the direction of helical axis as shown in Fig. 1: starting with the N-terminal residue as residue i, the residues i, i + 4, i + 8, . . . form edge 1; i + 1, i + 5, i + 9, . . . form edge 2; i + 2, i + 6, i + 10, . . . form edge 3; i + 3, i + 7 , i + 11, . . . form edge 4. These four edges then form a twisted parallel- epiped, each edge constituting approximately one quar- ter of the helical portion. Each residue is assigned with its surrounding hydrophobicity index (the numeric value in the membrane surrounding hydrophobicity scale to be described below) and the average of the surround- ing hydrophobicities of the residues constituting the edge i is taken to be the index of hydrophobicity of that edge:

a, = (W + ,)in (2)

where n is the total number of residues in the edge, and j increases at an interval of 4 from 0 to m - 3 if i = 1, from 0 t o m - 2 if i = 2, from 0 to rn - 1 if i = 3, and from 0 to m if i = 4, m being the number of residues in the helix. ai are the hydrophobic indices of the four edges of the helix. The plot connecting ai and i displays the variation of the amphipathic character of the helix about its four surface edges. The power or intensity of am- phipathicity (Aa) of a helix is taken to be the difference in the hydrophobic indices of two opposing faces, each of which is made of appropriately selected two adjacent edges, i.e.

A, = I(% + - (a3 + % ) I

A, = I(@, + a4) - (a* + @,)I or

(3)

It is to be noted that only the above two combinations of edges (i.e. 1-2 and 3-4, or 1-4 and 2-3) can con- stitute the opposing faces for a helix.

321

Page 3: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

FIGURE 1

Longitudinal view of a typical rx-helix. The four rows of residues falling on the four lines (edges) marked on the surface of the helix form the shape of a twisted parallelepiped with a square cross-section. Starting from the N-terminus, the residues i , i + 4, i + 8, . . . form edge 1 ; i + l 3 i + 5 , i + 9 , . . . f o r m e d g e 2 ; i + 2 , i + 6 . i + 1 0 , . . . formedge 3, and i + 3, i i 7 , i + 11, . . . form edge 4.

Prediction of transmembrane a-helices Transmembrane helices are predicted from amino acid sequence by the use of a ‘surrounding hydrophobicity profile’ of the protein. This profile is simply the plot of the surrounding hydrophobicity indices of the residues against their sequence numbers. In this plot the hydro- phobic and hydrophilic parts are distinguished by a horizontal line representing the average hydrophobicity value, which is obtained from the surrounding hydro- phobicity values for all the amino acid residues in the sample set of proteins.

The surrounding hydrophobicity profile thus con- structed projects the transmembrane helices as a se- quence of peaks and valleys above the average middle line (or with a few valleys crossing down the average line), and the other parts as peaks and valleys fre- quently crossing the middle line, or falling below the middle line. We search for a continuous sequence of 20-24 points above the average line with a maximum of two nonadjacent exceptions to determine the length of a predicted transmembrane helix. With this condi- tion, we did not face any difficulty or ambiguity in iden- tifying transmembrane helices for the families of mem- brane proteins considered in this work.

328

Accuracy of prediction The accuracy of theoretical prediction is assessed on the basis of a quantitative comparison between theo- retical prediction and experimental observation. The percentage of accuracy of prediction is computed as

Accuracy (%) = [ N , - (No + Nu)]/Nt X 100 (4) where N,, No and Nu are, respectively, the total number of residues experimentally observed, the number of resi- dues over-predicted and the number of residues under- predicted or missed in the transmembrane part in ques- tion.

RESULTS AND DISCUSSION

By computing the surrounding hydrophobicities of resi- dues [Hi in eqn. (I)] in a set of selected protein crys- tals, an average value for each of the 20 amino acid residues could be obtained. This set of 20 indices was called by us the ‘surrounding hydrophobicity scale’ (4, 5). From a comparative study of 38 hydrophobicity scales, Cornette et al. (2) showed that the surrounding hydrophobicity scale is one among the strongest scales in the literature. They found that this scale predicts the dominated frequency of amphipathic helices as 97.5’, a value expected from considerations of observed r-helices in proteins. Also, there have been many favourable reports in the literature over the last decade about the performance of this scale (19-22). Recently we updated this scale with 64 globular protein crystals (3). The indices of this scale (given as Hgp in Table 1 and as PON in Table 2) differ slightly from the values reported by us earlier (4, 5) , which were obtained with comparatively smaller numbers of protein crystals.

Because of their differing architecture, membrane proteins orient many of their residues in environments that are strikingly different from those seen in soluble protein molecules. Accordingly, the hydrophobicity scale derived solely from data of soluble proteins could not serve as a proper tool to apply to membrane pro- teins (see below). Hence, here we obtain a scale suit- able for membrane proteins. At present we have com- plete crystal data only for two membrane proteins, viz. the photosynthetic reaction centre (PRC) from the bac- terium R. viridis (23), and porin from Rhodobacter cap- sulatus (24). (Although crystal data are also available for bacteria rhodopsin, the refinement is not yet com- pleted for the loop regions, and hence it could not be included in the present type of computations.) Using the x carbon coordinates of PRC and porin, we com- puted separately the average hydrophobicity indices for the 20 types of residues as they occur (i) in the aque- ous parts, (ii) in the transmembrane parts and (iii) in the whole protein complexes. These values have been computed in the following way, Using eqn. (l), the surrounding hydrophobicities are computed for all the residues in PRC and porin. The average values for the 20 kinds of residues obtained from the above computed

Page 4: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

TABLE 1 Gbbular and mernbrune protein surrounding hydrophobicity scales

~~~ ~ ~

Rcsidue Amino Globular Membrane Membrane Coinbined acid protein aqueous protein membrane scale scale scale scale bcale (hi)" (H€!P) a ( H m J (HrnP) a (HPIn)"

Ala (A) ASP (D) CYS (C) Glu (E) Phe (F)

His (H) GlY (GI

Ilc (I) LYS (K) Leu (L) Met (M) Asn (N) Pro (P) Gln (Q) Arg (R) Ser (S) Thr (T) Val (V) TrP (W) TYr (Y)

0.77 0.56 1.42 0.57 2.77 0 0.70 3.05 1.54 2.07 1.57

- 0.01 2.67

-0.10 0.75

- 0.03 - 0.03

1.77 3.67 2.57

13.05 11.10 14.30 11.41 13.89 12.20 12.42 15.34 11.01 14.19 13.62 11.72 11.06 11.78 12.40 11.68 12.12 14.73 13.96 13.57

13.19 11.93 14.64 10.62 14.02 13.33 13.88 14.60 11.35 13.26 12.58 13.53 13.37 12.27 13.30 14.53 13.16 14.02 17.34 14.68

14.65 12.12 16.43 11.35 13.96 14.47 15.21 15.22 12.15 14.06 14.09 14.3 1 13.63 13.43 13.80 15.10 13.27 14.40 17.00 14.19

13.85 11.61 15.37 11.38 13.93 13.34 13.82 15.28 11.58 14.13 13.86 13.02 12.35 12.61 13.10 13.39 12.70 14.56 15.48 13.88

kcal.

TABLE 2 Correlation between certain popular hydrophobicity scales and PRC av-

erage hydrophobicity indices

Popular Reaction center scales hydrophobicity

scale a W", a ) ( H m J (HrnP)

NTJ FP KD VH -

GES ~

PON AMH EWT KL AMP07

0.53 0.69 0.38 0.48 0.56 0.59 0.22 0.49 0.21 0.54

- 0.23 0.01

- 0.02 - 0.07 - 0.08

0.01 - 0.08 - 0.06 - 0.22

0.13

0.35 0.65 0.45

- 0.55 - 0.60 0.62 0.30 0.48 0.20 0.67

* See text for abbreviations of names of scales.

results give the indices of the membrane protein scale, Hmp, given in Table 1; the average values computed by considering only the residues constituting the aqueous parts of the two proteins give the indices of the mem- brane aqueous part scale, H,,, given in Table 1; simi- larly, the average values obtained by using only the residues of the transmembrane parts give the indices of the Hmt scale (not given). It is to be remembered that although only those residues constituting the respective

parts were considered as central residues while com- puting the average values for the aqueous and trans- m$mbrane parts, all the residues (occurring within an 8 A limit around the respective central residue) in the respective membrane protein were considered as sur- rounding members. Let us now see how these average surrounding hydrophobicities of the 20 residues perti- nent to membrane protein parts compare with those developed for globular proteins, as well as with hydro- phobicity indices of a few other popular hydrophobic- ity scales.

Globular versus membrane proteins It is generally presented that the characteristics of the aqueous parts of the membrane proteins are similar to those of soluble globular proteins. It is this assumption that prompted many workers to apply the hydropho- bicity scales meant for globular proteins to membrane proteins as well. However, a comparison of the aver- age hydrophobic behaviors of the 20 amino acid resi- dues seen in the globular proteins and in membrane proteins PRC and porin indicate signlficant differences. We bring out this fact by computing correlation coef- ficients between the sets of H m p , Hma and H m t values and the indices of various hydrophobicity scales. For this purpose we selected three popular globular protein hydrophobicity scales, namely, the Nozaki-Tanford- Jones (NTJ) scale (17), the Fauchere-Pliska (FP) scale (25) and the Kyte-Doolittle (KD) scale (8), and six

329

Page 5: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

membrane protein scales, namely, the von Heijne (VH) scale (1 l), the Engelman-Steitz-Goldman (GES) scale (12), the Eisenberg-Weiss-Terwilliger (EWT) scale (26), the Argos-Mohana Rao-Hargrave scale (lo), the Kuhn-Leigh scale (1 3) and the Esposti-Crimi- Venturoli (AMP07) scale (14). Included in this com- parison is our surrounding hydrophobicity scale (3) for the globular proteins. The computed correlation coefficients are reported in Table 2. The entries of this table indicate that the hydrophobic activities of amino acid residues are much different in globular and mem- brane proteins. None of the selected scales correlates with the Htnp scale, with R > 0.67. As expected, there is no correlation between the transmembrane part scales of PRC and porin themselves. For that matter, the popular membranc protein scales are no better than the globular protein scales in reflecting the character- istic hydrophobic behavior of residues in PRC and porin.

To make an effective comparison of the average be- haviors of aqueous, transmembrane and total parts of PRC and porin with those of the soluble globular pro- teins, the average surrounding hydrophobicity values of the 20 residues in these parts and the H g p scale (as the best example of a globular protein scale) are presented in a bar diagram in Fig. 2a, aligning the glycyl residue values as reference states. This diagram brings to light many facts regarding the hydrophobic behavior of the 20 kinds of residues in normal proteins, and in mem- brane protein parts. (1) Ala, Glu, Phc, Ile, Leu, Met and Val are considerably less, and Asn, Ser and Trp are more hydrophobic in membrane aqueous parts com- pared to normal globular proteins. (2) The hydropho- bic losses in the aqueous parts for the residues Ala and Met are sufficient to reduce their hydrophobic activity level below the glycyl activity level in the normal globu- lar proteins. (3) The two polar neutral residues Asn and Gln have increased their hydrophobic activity level sub- stantially, whereas Phe, Ile, Leu, Met, Arg, Thr, Val, Trp and Tyr have lost their hydrophobic activity level in transmembrane parts when compared to normal globular or to membrane surface (aqueous) parts. (4)The behaviors of Asn and Gln in transmembrane parts are very surprising: they are associated with more hydrophobic neighbors than even residues Phe, Ile and Leu; On the other hand, Tyr exhibits the poorest hy- drophobic activity in transmembrane parts, its neigh- bors being mostly polar or poor hydrophobic members. ( 5 ) The overall activity of glycyl rcsidues is also notc- worthy; they have differing average hydrophobicity in- dexes in the normal and in the membrane protein parts, 12.20 kcal/mol in globular proteins, 13.33 kcal,’mol in aqueous membrane parts, 17.2 1 kcal/mol in transmem- brane parts and 14.47 kcalimol in the whole complex of the membrane proteins.

In Fig. 2b, the H,, scale is compared with the most widely applied membrane protein study scale, namely, the KD scale (8). Here again we note significant dif-

330

ferences between the behaviors of all nonpolar residues and polar residues His and Ser as measured by the two scales.

The globular protein scale is compared with the scale derived for PRC alone in Fig. 2c and with the scale derived for porin alone in Fig. 2d. From the diagrams we observe the following. (1) In PRC, Ala, Cys, Glu, Phe, Ile, Lys, Leu, Met, Gln, Arg, Val and Tyr are considerably less, and Ser and Trp are more hydro- phobic in membrane aqueous parts, whereas in porin, Phe, His, Ile, Lys, Leu and Val are considerably less, and Ala, Asp, Asn, Arg, Ser, Thr and Tyr are more hydrophobic compared to normal globular proteins. (2) The hydrophobic losses in the aqueous parts for the residues Ala, Leu, Met, and Arg, in PRC and Phe, His, Ile and Val in porin are sufficient to reduce their hy- drophobic activity level even below the glycyl activity level in normal globular proteins. (3)The two polar neutral residues Asn and Gln have increased their hy- drophobic activity level, whereas Phe, Ile, Leu, Met, Thr, Val, Trp and Tyr have lost their hydrophobic ac- tivity in PRC, and in porin, Gln have increased and Phe, Ile, Leu, Met, Arg, Thr, Val, Trp and Tyr have lost their hydrophobic activity level in the transmembrane parts when compared to normal globular or to mem- brane surface (aqueous) parts. (4) In PRC, Tyr, and in porin, Phe, Arg and Trp exhibit the poorest hydropho- bic activity in the membrane part, indicating that the aromatic residues lose considerably in the hydrophobic activity of membranc parts. (5) The overall activity of glycyl residues is also noteworthy: in PRC, 14.34 kcal/ mol in aqueous solution, 18.59 kcal/mol in the mem- brane and 15.33 kcal/mol in the total complex. How- ever, in porin the result is 9.04 kcal/mol in aqueous solution, 15.23 kcalimol in the membrane and 11.97 kcalimol in the total complex.

The selection of the Nozaki-Tanford-Jones (NTJ) hydrophobicity indices (17) as input h, values in eqn. (1) is somewhat arbitrary. Hence we computed an alter- nate set of average surrounding hydrophobicity values using the hydrophobicity indices of another popular scale by Fauchere & Pliska (25) as input. There were no significant differences between this set and that of the values computed with the NTJ indices. Hence, we decided to consider only the NTJ-based surrounding hydrophobicity indices in our further work.

Surrounding hydrophobicity scale for membrane proteins If we want to develop a reliable surrounding hydropho- bicity scale for the whole family of membrane proteins in analogy with the case of globular proteins, it is es- sential to include a sufficient number of different mem- brane protein crystals in the computation. At present, however, we do not have any additional crystal data than those of PRC and porin, and hence we must use the H,,,, scale derived from these two proteins in com- bination with other relevant data for the purposes of studying the membrane proteins in general. Luckily, the

Page 6: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

4- n

2 -

C A v w

I S 1 1 I

U u ' 1 F G

T Y 1 'I K

-2

E

Residue

4t ( c ) W

2

0

-2

-4

U

Y

-4-

-6 -

Residue

FIGURE 2 Relative hydrophobic activity levels of the 20 types of amino acid residues in globular and membrane proteins: (a) dotted column, H g p scale; empty column, H,, scale; slant column, H,, scalc; filled column, Hmp scale; (b) filled column, H,, scale; empty column, KD scalc; (c) dotted column, globular proteins; empty column, PRC aqueous parts; slant column, PRC transmembrane parts; filled column, PRC total complex; (d) dotted column, globular proteins; empty column, porin aqueous parts; slant column, porin transmembrane parts; filled column, porin total complex. The one-letter code (Table 1) of amino acids is used.

331

Page 7: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

structures of PRC and porin are quite large and com- plex, the former having 11 transmembrane helices, and the latter having 16 transmembrane strands, thus pro- viding a good amount of information regarding the in- volvement of helices and strands inside the membrane matrix. As the aqueous parts of membrane proteins resemble very much the body of globular proteins, as a good approximation one could use the information from globular proteins to study the characteristics of the aqueous parts of membrane proteins. In globular pro- teins, the p-strands are comparatively more buried than %-helices, and the amphipathic nature of r-helices and b-strands are highly differentiating. Hence, a judicious combination of the information from the globular pro- tein structures with those of the two membrane proteins PRC and porin will be an appropriate approach to develop a general surrounding hydrophobicity scale for the membrane proteins, in the light of paucity of data on these proteins. Accordingly, we choose to mix the surrounding hydrophobicity scales H,, (ofglobular pro- teins) and Hmp (of PRC and porin) to obtain a common scale, H,. As a bcst approximation, we represented this new scale as simply the averages of the indices for each residue in the Hgp and HlllP scales. We mixed the globular protein scale (derived from 64 proteins) and the membrane protein scale (derived from two proteins) at the 1 : 1 levcl so that the resultant scale acquires ad- equate power to deal with the aqueous parts (for which membrane protein data alone are insufficient), and transmembrane parts (on which the globular protein data will have a negligible effect owing to the lack of lengthy hydrophobic stretches) in the predictive scheme. Interestingly, in a variety of membrane proteins, both of trans-helical and trans-strand kinds, the Hgm scale predicted transmembrane part conformational states more precisely than other methods available in litera- ture. Hgm is included in Table l . This scale correlates with both the H,, and Hmp scales at R values of 0.90. It is surprising that although these two scales correlate at such high levels with the mixed scale, they fail to perform properly in predictive attempts, while the mixed scale succeeds.

Prediction of transmembrane helices We attempted four types of predictions based on hy- drophobicity profiles constructed as detailed in Mate- rial and methods. First, we took 20 randomly selected soluble globular proteins to check whether the present method predicts any transmembrane segments in these proteins. Second, we took 26 selected membrane pro- teins (with representations to transport, membrane an- chor and receptor families) suggested to have helices as transmembrane parts, to check whether the present predictive scheme correctly predicts the number of transmembrane helices in these proteins. Third, we took PRC and predicted the transmembrane helices in it, and compared the results with the X-ray observation, and as well as with theoretical predictions already made

332

by other workers. Finally, we made transmembrane helix predictions for the same 26 membrane proteins. These steps enabled us to assess our predictive scheme more objectively in the light of numerous other trans- membrane helix predictive schemes available in the lit- erature.

The surrounding hydrophobicity profile (a plot of Hgm index vs. sequence number) for two typical globu- lar proteins, cytochrome c and lysozyme T4, are shown in Fig. 3. It is obvious from these plots that there are no segments for which peaks occur continuously for more than 10 residues above the control line. This means that the method predicts no transmembrane he- lices for these proteins. A perusal of the profiles of all other considered globular proteins also showed similar results. This result confirms that the method is discrimi- native to exclude transmembrane kind helices in globu- lar proteins.

The authors who sequenced the 26 membrane pro- teins have suggested the probable number of transmem- brane helices in each of them. The total number of such suggested helices in the 26 proteins is 53. The surround- ing hydrophobicity profiles of these 26 membrane pro- teins indicated the presence of 50 segments, each hav- ing peaks above the control line for 18-25 residues successively (within the permitted two violations). Of the 50 segments 37 show a good amphipathicity. This result reflects the transmembrane helix discovering power of the present method from amino acid sequence, if such a part existed in it. It missed only one segment in each of bovine rhodopsin, bacteria rhodopsin, and mitochondria1 cytochrome b at the C-termini.

Prediction of transmembrane helices in PR C The surrounding hydrophobicity profiles for the H, M and L chains of PRC, constructed with the use of the H,, indices, are presented in Fig. 4. The segments identified as transmembrane helices from these profiles and those from X-ray work are given in Table 3. In this table we include the transmembrane helices predicted by six other groups, viz. Kyte & Doolittle (KD) (8), Sieved, Kyte & Doolittle (SKD) (15, 27), Klein, Kane- hisa & DeLisi (KKD) (28), von Heijne (VH) (29), En- gelman, Steitz & Goldman (GES) (12) and Esposti, Crimi & Venturoli (AMP07) (14). Fasman & Gilbert (1 5 ) qualitatively compared the predictions for PRC made by nine algorithms, five developed for globular proteins [Chou & Fasman (30), Biou et al. (31), Kyte & Doolittle (8), Rose etal. (9) and Sieved, Kyte & Doolittle (27 ) ] , and four developed specifically for mem- brane proteins [ Klein, Kanehisa & DeLisi (28), Eisen- berg, Weiss & Terwilliger (26), Finer-Moore & Stroud (32) and Vogel, Wright & Jahnig (33)], and indicated that the SKD and KKD algorithms were the best pre- dictors among the nine. For this reason, we considered the results from these two algorithms to make compari- sons with the present work. The KD method is very popular since its publication, and a generalized version

Page 8: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

20 40 60 80 I00 120 140 160

L I 20 40 60 80 100

Residue Number

FIGURE 3 Surrounding hydrophobicity profile for two typical globular proteins, cytochrome C and lysozyme T4.

l o \ 40 80 I20 160 200 240 - 0 ,

L L .- I 80 I20 I60 200 2 40 280 320 40 U

L L

3 0) 15

14

13

12

I 40 80 I20 160 200 240

Residue Number

FIGURE 4 Surrounding hydrophobicity profiles for L chain, M chain and H chain of PRC. The predicted helical segments are shown as thick lines joining the end residues marked by thick filled circles.

333

Page 9: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

of this approach is claimed to be sufficient to predict membrane-spanning helices and strands (34); hence the results from the application of this scale were also con- sidered. We found that the algorithms of von Heijne (29), Engelman, Steitz & Goldman (12) and Esposti, Crimi & Venturoli (14) were more powerful than those considered by Fasman & Gilbert (15), and hence we included the results of these groups as well. Thus, the six methods included in Table 3 are the most challeng- ing ones with which we compare the performance of our method.

From Table 3, we find that the KD method has pre- dicted 153 residues wrongly (1 14 over-predictions and 39 under-predictions) in the 1 1 transmembrane seg- ments, the accuracy of prediction being ca. 45 O, , . The SKD method predicted 233 residues to be of trans- membrane kind, an over-prediction of 39, and an under- prediction of 82 residues. This gives an accuracy of 5672. Similarly, the accuracies of the predictions of the KKD, VH, GES and AMP07 methods are 68, 72, 73 and 73 Yo, respectively. On the other hand, our method performs at an accuracy of 77 O 0 .

In our predictions we made an interesting observa- tion: the over-predictions were mostly on the N-terminal side, and the under-predictions were mostly on the C-terminal side. This prompted us to make end cor- rections. We deleted residues one by one from the N-terminal side if there was an over-prediction, or in- cluded residues one by one in the C-terminal end if there was an under-prediction until the predicted length reached an optimal value of 24 residues. This process substantially improved thc predictive power to thc level of 8 1 ”/, (bottom of Table 3). However, as we have to use an optimal segment length in this corrective proce- dure, the improvement becomes subjective, and hence it is essential to select a proper length to obtain reliable results.

From the predicted segments of PRC, we observed that 14 terminal residues are predicted correctly within a 2 residue error. The AMP07 scale predicts 12, the GES scale predicts 10 and the VH method predicts only 8 termini within 2 residue errors.

Amphipathic character of helices in PRC The amphipathic character of the 20 helical segmcnts, 11 in the transmembrane part (five in each of the L and M chains, and one in the H chain) and nine in the aqueous part (three in the L chain, fivc in the M chain and one in the H chain) of PRC are displayed in the plots of Fig. 5. The helices are indicated in this figure by symbols, such as AL1, TM2, TH etc., where the letters A, T, L, M and H, indicate, respectively, aque- ous, transmembrane, L chain, M chain and H chain parts, and the numerals 1, 2 etc. indicate the serial numbers; thus AL1 refers to the ‘aqueous part, L chain, number 1 helix’, and TM2 refers to the ‘transmembrane part, M chain, number 2 helix’. Considering the three aqueous part helices of the L chain (Fig. 5), we note the

334

TABLE 3 Predirriori of rruiisniernbratre helices in PRC experimental (X-my) resuits

(23)

L chain M chain H chain

33-53 (21) 52-76 (25) 12-35 (24) 84-1 1 1 (28) 111-137(27) 116-139 (24) 143-166 (24) 171-198 (28) 198-223 (26) 226-249 (24) 260-284 (25)

Theoreticaf predictions

Chain Predicted Over Under length predi- predi-

ction ction

Kyte-Doolittlc (KD) (8, 15) (1 1 residue window) L 15-53 (39) 18 0

74-104 (31) 10 0 83-118 (36) 0 4

110-156 (47) 17 0 176- 199 (24) 1 5 220-253 (34) 10 0

M 41-80 (40) 15 0 106-117 (12) 5 20 136-162 (27) 7 4 194-222 (29) 4 1 239-256 (18) 18 0 260-286 (27) 2 0

H 5-30 (26) 7 5

Total 1 I4 39

Accuracy 44.6%

Sieved Kytc-Doolottle (SKD) (27.15) (9 residue window)

L 28-44 (17) 5 9 8 1-89 (9) 3 22

116-139 (24) 0 0 177-198 (22) 0 6 228-247 (20) 0 4

M 47-89 (43) 18 0 109-1 19 ( 1 1) 2 18 141-156 (16) 2 10 199-225 (27) 2 1 262-279 (1 8) 0 7

H 5-30 (26) 7 5

Total 39 82

Accuracy 56.2:<

Klcin-Kanehisa-DeLisi (KKD) (28, 15)

L 24-55 (32) 11 0 86-102 (17) 0 11

111-141 (31) 7 0 179-197 (19) 0 9 234-250 (17) 7 0

Page 10: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

TABLE 3. Continued TABLE 3. Continucd

Chain Predicted Over Under length predi- predi-

ction ction

Chain Predicted Over Under length predi- prcdi-

ction ction

M 55-80 (26) 4 115-131 (17) 0 144-170 (27) 4 208-224 (17) 1 266-289 (24) 5

39 Total Accuracy

von Heijne (VH) (29) L 30-48 ( I 9) 3

84-102 (19) 0 113-131 (19) 0 177-195 (19) 0 232-250 (1 9) 1

M 53-71 (19) 0 11 1-129 (19) 0 148-166 (19) 0 206-224 ( 1 9) 1 266-284 (19) 0

H 12-30 (19) 0

5 Total Accuracy

Engelmen-Sleitz-Glodman (GES) (12) L 32-51 (20) 1

84-103 (20) 0 116-135 (20) 0 175-194 (20) 0 233-252 (20) 3

M 52-71 (20) 0 11 1- 130 (20) 0 148-167 (20) 1 206-225 (20) 2 267-286 (20) 2

H 12-31 (20) 0 9 Total

Accuracy

Esposti-CriIni-Venturoli (AMP07) (14) L 27-51 (25) 6

86-102 (17) 0 116-132(17) 0 174- 195 (22) 0 228-248 (21) 0

M 52-74 (23) 0 113-127 (15) 0 143-163 (21) 0 200-225 (26) 2 267-284 (1 8) 0

H 20-32 (13) 0

Total 8 Accuracy

3 10

1 10 6

50

67.8%

5 9 5 9 6

6 8 5 8 6 5

72

72.1 %

2 8 4 8 7

5 7 5 8 7

4 65

73.2%

2 11 I 6 3

2 12 3 2 7

11

66

73.2%

Present work

L 29-54 (26) 5 0 83-102 (20) 1 8

111-131 (21) 4 8 171-197 (27) 0 I 220-243 (24) 6 6

M 49-75 (27) 3 1 110-133 (24) 1 4 143-162 (20) 0 4 199-225 (27) 2 1 262-284 (23) 0 2

H 14-31 (18) 0 6 Total 22 41

Accuracy 77.2",

Present work with end correction

L

M

H

31-54 3 0 83-106 1 4

1 11- 134 4 5 174-197 0 4 220-243 6 6

52-75 0 1 110-133 1 4 143-166 0 0 202-225 2 4 262-285 1 2

14-37 2 2 Total 20 32

Accuracy 8 1.1 '<,

Helical lengths are given in parentheses. Length of each helix in the present work with end correction is 24.

following features: in helix ALI, edges 1 and 2 have higher hydrophobicities than edges 3 and 4, and hence they constitute two faces for the helix, each differing in hydrophobic intensity from the other; this means that this helix is amphipathic in nature. In helix AL3, edges 2 and 3 have lower, and edges 1 and 4 have higher hydrophobicities to form the two faces to bring out its amphipathic character. Considering the plots of the five helices in the transmembrane parts of the L chain of PRC, we note the following: in helix TL3, edges 2 and 3 having higher and edges 1 and 4 having lower hydro- phobicities form the two faces to reflect its amphipath- icity; in helix TL4, edges 3 and 4 having higher, and edges 2 and 3 having lower hydrophobicities form the two faces to reflect its amphipathicity; in helix TL5, edges 1 and 4, and 2 and 3 in combinations form faces of differing hydrophobicities. The five helices in the aqueous parts of the M chain reflect the following fea- tures: in helix AM5, the face comprising of edges 1 and

335

Page 11: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

L chain Aq part

15 L cham Mem part e

1 M chain. Mem. port 15 I

1 H chain

12- I 2 3 4

Helical edges

FIGURE 5 Amphipathic character of helical segments in PRC, showing the hy- drophobic activity levels of the four edges (described in Fig. 1 ) of the helices in PRC.

4 has higher, and the face comprising of edges 2 and 3 has lower hydrophobicity, whereas in AM4 the face comprising edges 2 and 3 has higher, and the face made of edges 1 and 4 has lower hydrophobicity; in AM3, the face comprising of edges 1 and 2 has higher, and the face comprising of edges 3 and 4 has lower hydropho- bicity. On the other hand, considering the transmem- brane parts of the M chain, only the helices TM 1, TM2 and TM3 have some amphipathic character, whereas TM4 and TM5 have no such character. Considering

336

the two helices of the H chain, we note that the helix in the membrane part is of amphipathic nature, and the aqueous part helix shows no amphipathicity.

It is of interest to note that the six helices AL1, AL3, AM5 (which are on the periplasmic side), AM3, TM3 and TH exhibit appreciable amphipathic characters. All the other helices either lack amphipathic character or have only one edge with a slightly higher hydropho- bicity.

Prediction of transinembrane helices in other membrane proteins The predicted transmembrane helical segments in the randomly selected 26 membrane proteins (for which there is no knowledge about their 3D structures) are listed in Table 4. The hydrophobicity profiles of three typical proteins are shown in Fig. 6. Table 4 also in- cludes the transmembrane helices that were suggested by other authors (either by those who determined the sequences or by others who attempted earlier predic- tions: refs. 35-58). As the suggestions made by the previous workers are not based on experimental stud- ies, it is not proper to compute the percentage of ac- curacy of prediction or to make any conclusive remarks. However, we make such an attempt simply to assess how good the previous suggestions are. Surprisingly, we get an overall accuracy of 83%, when all the 26 proteins are considered. The major failure is the miss- ing of the last helical segments 285-309, 78-101 and 146- 169 of bovine rhodopsin, bacteria rhodopsin and mitochondrial cytochrome b, respectively (if the resi- dues of thcse three segments are excluded, the predic- tive accuracy increases to 89%). From a perusal of the sequences of these three missed segments, we note that there are too many polar residues in them: ten in bo- vine rhodopsin segment, eight in bacteria rhodopsin segment, and seven in mitochondrial cytochrome b seg- ment; these residues interrupt the formation of a heli- cal segment of the required length.

We have selected the 26 membrane proteins as ran- dom representations from various types of proteins, namely, transport, membrane anchor and receptor members. In all these cases the predictions are reason- ably good. Our method also predicts correctly the mem- brane spanning segments of many multispanning pro- teins, such as Cation ATPase, subunits of Torpedo California, Cardiac Ca(2 + ) - Mg(2 + ) ATPase, lac- tose premease, etc.

Kuhn & Leigh (13) developed a statistical predictive technique involving a sequence of membrane property values derived from the frequency of occurrence of amino acids in a number of putative transmembrane segments. In this method, a running average over the numeric sequence yields a membrane property profile from which transmembrane segments are chosen. These authors applied this technique to a pool of ten putative membrane proteins and compared the results with those previously suggested by other authors. They averaged

Page 12: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

TABLE 4 Prediction of transmembrane helical .segments in 26 membrane proteins

Protein Presently Amphi- Over- Under Previously suggested predicted pathy predi- predi- segments ( A x ) ction ction Segments Ref.

Transport proteins: Bovine rhodopsin

Bacteria rhodopsin

Mitochondria1 cytochronle b

Membrane anchor proteins:

Glycophorin A Influenza virus agglutinin Histocompatibility antigen Semliki E l Semliki E2 Neuraminidase Sindbis E l Sindbis E2 Transplantation antigen Rabies virus glycoprotein Vesicular stomatitis Coat protein

Receptor proteins:

Interleukin-:! receptor Tyrosine kinase receptor Aspartate receptor

Epidermal growth factor receptor Asialo glycoprotein receptor Serine chemo-receptor Human LDL receptor T-cell receptor Transferin receptor Kinate binding protein

Glutamate receptor

37-57 (21) 75-92 (18)

114-133 (20) 153-170(18) 203-229 (27) 253-276 (24)

10-29 (20) 48-66 (19)

108-127 (20) 130-156 (27) 177-193 (18) 206-224 ( 1 9

32-56 (25) 77-98 (22j

116-137(22)

180-208 (29) 230-247 (18) 290-308 (19) 324-344 (21) 348-367 (20)

75-95 (21)

185-210 (26)

413-436 (24) 16-38 (23)

364-391 (28) 10-38 (29)

368-390 (23)

459-480 (22)

22-40 (19)

406-436 (3 1)

289-309 (21)

74-93 (20)

241-259 (19) 653-675 (23)

7-33 (27)

649-668 (20) 192-212 (21)

53-76 (24) 187-214 (28) 789-810 (22) 151-172 (22)

164-186 (23) 202-217 (16) 229-253 (25)

150-168 (19) 186-202 (17) 213-233 (21) 396-418 (23)

62-89 (28)

414-434 (21)

0.84 I .03

0.73 1.10

0.85 0.43

1.16 0.46 0.59 0.95

0.46

0.69 1.20

1.62 0.58

2.2 1

1.28 0.63

1.24 0.40 0.53

0.88 1.09 0.30 0.93 0.56 0.73 1.53

0.20 0.95 2.37 1.26 1.11

0.83 1.20 0.94 1.91

0 0 0 0 0 0

0 2

0 3 2 0 5 4 2

7 0 0 2 0

0

0 0 0 0 3 0 0 2 0 0 0

0 1 3 0 0 6 0 0 0 0 0 0 1 2 0 0 0 0

1

1 3 0 4 0

25 4 I

24 4 1 9 5 0 0 5

24 1 5 1 0 3

2 0 1 0 0 3 2 5 4 0 0 0

0 0 0 3 3 2 2 0 0 0 3 3 0 1 2 3 4 1

37-61 (25) 74-98 (25)

114- 140 (27) 153-176 (24) 203-230 (28) 253-276 (24) 285-309 (25)

8-31 (24) 41-64 (24)

78-101 (24) 107-130 (24) 133-157 (24)

199-224 (25) 168-191 (24)

33-52 (20) 80-97 (18)

146-169 (24) 179-201 (23)

289-308 (20)

111-135 (25)

230-252 (23)

324-342 (19) 348-370 (23)

73-95 (23)

185-210 (26) 15-38 (24)

413-436 (24) 364-391 (28)

7-35 (29) 405-437 (33) 363-390 (28) 285-307 (23) 459-480 (22)

22-40 (19) 74-93 (20)

24 1-259 (19) 654-675 (22)

7-30 (24) 189-212 (24)

59-78 (20) 185-214 (30) 789-810 (22) 151-172 (22)

62-89 (28) 164-189 (26) 200-2 18 (1 9) 230-253 (24) 416-435 (20) 148-168 (21) 184-203 (20) 213-237 (25) 395-418 (24)

646-668 (23)

Total number of transmembrane segments previously suggesfedj Number of segments presently predicted: Number of residues in membrane parts:

Wrong (overiunder) predictions: Accuracy of prediction:

3 3 1

Page 13: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

15

14

13

12 - - 40 00 120 I6 0 200 240 280 320 360

E \

0

Y

-

Y

h ‘5 c .- 3 14 a 13 0 Jz

? D g 12

0

U c 3

20 40 60 80 100 120 G

: I Kinate bindinq protein I 15

14

13

I2

cn

40 80 120 160 200 240 200 320 360 400 440 480 Residue Number

FIGURE 6 Surrounding hydrophobicity profiles for three membrane proteins. mitochondria1 cq tochrome b, glycophorin A and kinate binding protein. The predicted helical segments are shown as thick lines.

the end points of the previously suggested transmem- brane segments (in cases where there were more than one suggestion for the same protein) and obtained con- sensus segments with which their predictions were com- pared. These authors claim that there was 93.60; agree- ment on a residue-by-residue basis between their predicted results and the consensus results suggested previously. We computed the percentage of accuracy as per eqn. (4) between the predictions of Kuhn & Leigh and the consensus segments from previous suggestions, and between our predictions for these 10 proteins and the same consensus segments. In Table 5 these results are presented. Interestingly, our method has performed better (74% accuracy) than the method of Kuhn & Leigh (67% accuracy).

As we have stated earlier, the effective influence of each residue over the surrounding medium extends only up to 8 A radius (6), and the surrounding hydropho- bicity scale derived on this basis for the globular pro- teins (3-5) works extremely well in bringing out many of the characteristics of folded proteins. While making an attempt to develop a similar scale for membrane proteins, one is faced with two dficulties: first, there are not enough crystal data, and second, in the trans-

338

membrane parts, the amino acid residues are also sur- rounded by some non-amino acid atomic groups, namely the carbohydrate moiety. We have tried to com- pensate these factors by a careful mixing of crystal data sets from globular proteins and membrane proteins. The globular protein crystal data, in combination with the membrane protein crystal data, are rich enough to treat the aqueous parts, which more or less resemble globular protein parts. On the other hand, even though we have only one crystal example in each of trans-helix and trans-strand membrane proteins, they are reason- ably rich in trans-helices and trans-strands; also, al- though a residue in a trans-helix or trans-strand part has interactions with carbohydrate groups, it is in a long stretch of hydrophobically rich residues in an or- dered conformational state; hence the additional inter- actions do not influence it to part away from the heli- cal or strand state. Probably a correction term may be added, but the selected control length is able to over- come this deficiency adequately. As a net result, the newly derived general hydrophobicity scale works ex- ccedingly well in the predictive scheme.

While formulating the procedure to predict trans- membrane helices, we found that there is no necessity

Page 14: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

TABLE 5

Relalive predictive performances of the Kuhn-Leigh method and the present method

Protein Previous Kuhn-Leigh prediction Present prediction consensus segments Segments Over Under Segment Over Under

Aspartate receptor 7-30 189-212

Glycophorin A Hernagglutinin Neuraminidase Phage M13 coat Rhodopsin

Bacteria rhodopsin 8-33 41-65

77-100 107-129 134- 157 171-195 200-224

69-93 185-211

7-35 21-39 36-63 73-99

110-138 151-177 201-230 252-278 285-3 11

Sernliki El 413-436 Semliki E2 364-391 Glycoprotein (stomatitis virus) 74-93

5-32 193-21 I 441-459

11-30 42-72

106-128 134- 156

204-224 73-94

185-207 6-36

20-42 38-62

68-100 112-133 152-174 202-220 251-276

413-437 359-383

72-91

4 5

19 0 7 0 1 0 0 0 4 0 2 4 1 6 0 0 0 1 0 1 5 2

0 0 0 6 1

24 1 1

25 4 1 4 0 0 2 0 7 4

11 2

21 0 8 2

7-33 192-212

10-29 48-66

108- 127 130-156 176- 193 206-224

75-95 185-210

10-38 22-40 37-57 75-92

114-133 153-170 203-229 253-276

413-436 364-391

74-93

3 0

0 1 0 0 4 0 0 6 0 3 1 0 0 0 0 0 0 0 0 0 0

0 3

6 7

24 3 1 7 6 2 1 3 1 7 9 9 9 3 3

27 0 0 0

Accuracy 66.7% 74.2’,

to use any window length. This is because, as the trans- membrane helical segments are continuous stretches of hydrophobic residues (Ala, Leu, Ile and Val being the dominant members), the single residue hydrophobicity representation is quite adequate to identify a control length above the global average value. However, in the case of trans-strands, we found it as necessary to use a window length, as the strand segments incorporate quite a few polar residues intermittently with the non- polar residues.

CONCLUSIONS

In the folded state of a protein (globular or membrane), each residue is surrounded by a characteristic cluster of other spatially associated residues, and this cluster con- stitutes a microenvironment around it. By characteriz- ing this environment in terms of properties of the en- closed residues, one could study the behavior of the central residue. In this work we characterized the av- erage microenvironments of the 20 kinds of residues in the crystal structure of the two membrane proteins pho- tosynthetic reaction center and porin by introducing the concept of ‘surrounding hydrophobicity’, which was

computed as the sum of the hydrophobicities of the clustering members. The surrounding hydrophobicity indices so obtained for the 20 residues constituted a hydrophobicity scale for the 20 amino acid residues, which differed substantially from such a scale devel- oped with globular protein crystals, and also from scales exclusively developed for membrane proteins by others. A proper mix of the surrounding hydrophobicity scale of the membrane proteins and that of the globular pro- teins yielded a new scale applicable for membrane pro- teins. A predictive algorithm based on this scale was used to identify the transmembrane segments of the photosynthetic reaction center, and 26 other membrane proteins for which there was no knowledge about their three-dimensional structures. This algorithm predicted all the transmembrane helical segments in the photo- synthetic reaction center to an accuracy of 8 1 % (while the existing algorithms could predict only up to 730<), and identified 50 out of 53 transmembrane helical seg- ments in the 26 membrane proteins. The scale was also used to measure the amphipathicity of the helices in the sample proteins. The major aspect of the present pre- dictive technique is that it just requires the construction of a simple plot of hydrophobicity versus sequence

339

Page 15: Prediction of transmembrane helices from hydrophobic characteristics of proteins

P.K. Ponnuswamy and M.M. Gromiha

number, without involving any averaging or curve fit- ting process over a window length, an essential feature of many of the other existing techniques. Since the concept of surrounding hydrophobicity takes ca5e of residue-residue influences within a volume of 8 A ra- dius around a residue, the use of the hydrophobicity indices developed under this concept eliminates the ne- cessity of averaging over any window length. The pro- cess of averaging a property spread over a window length has some inherent limitations. For example, Lip- man et al. (59) demonstrate that the power of predic- tive procedures involving the use of average hydropho- bicity scores of overlapping subsequences will be low if one sets the hydrophobicity threshold high enough to maintain acceptable reliability. Also, the present pro- cedure to measure the amphipathicity of r-helices in- corporates the geometric feature of the helix in a more explicit manner than the usual methods followed in the field. The major limitation of this new hydrophobicity scale has been that in its development only two mem- brane protein crystal structures, namely, that of the photosynthetic reaction center and porin were used in combination with several structures of globular pro- teins. However, the crystal structure of PRC is unique in that it has 20 %-helices, and the crystal structure of porin has 16 P-strands, and hence it would not be that dangerous to propose from its information a statistical methodology, as it would seem. The predictions made with the use of this scale are highly superior when com- pared to the most challenging algorithms in the field. Upon the availability of more crystal data of membrane proteins, one could improve this scale.

ACKNOWLEDGEMENTS

We thank Prof. G.E. Schulz for providing the atomic coordinates of porin. This work was supported bq a DST grant to P.K.P. and a CSIR SRF to M.M.G.

REFERENCES

1. 2.

3. 4.

5.

6.

7.

8. 9.

10.

11.

340

Tanford, C. (1962) J . Am. Chenz. Soc. 84. 4240-4247 Comette, J.L., Cease, K.B., Margalit, H., Spouge, J .L., Berzof- sky, J.A. & DeLisi, C. (1987) J . Mol. Biol. 195, 659-685 Ponnuswamy, P.K. (1993) Progr. Biophy. Mol. Bid. 59, 57-103 Manavalan, P. & Ponnuswamy, P.K. (1978) Nature lLotidotij

Ponnuswamy, P.K., Prabhakaran. M. & Manavalan, P. (1980) Biochim. Biophys. Acta 623, 301-316 Manavalan, P. & Ponnuswamy, P.K. (1977) Arch. Biochem. Bio-

Matthew, J.B. (1985) Annu. Rev. Biophys. Bioph,~..~. Chem. 14, 387-411 Kyte, J . & Doolittle, R.F. (1982)J. Mol. Bid. 157, 105-132 Rose, G.D., Geselowitz, A.R., Lessor. G.J., Lee, R.H. & Zehfus. M.H. (1985) Science 229, 834-838 Argos, P., Rao, J.K.M. & Hargrave, P.A.H. (1982) Eur. J . Bio- chem. 128, 565-575 Von Heijne, G. (1981) Eur. J . Biochem. 116, 419-422

275, 613-674

phys. 184, 476-487

12. Engelman, D.M., Steitz, T.A. & Goldman, A. (1986)Annu. Rev.

13. Kuhn, L.A. & Leigh, Jr. J.S. (1985) Biochim. Biophys. Acta 828,

14. Esposti, M.D., Crimi, M. & Venturoli, G. (1990) Eur. J . Bio-

15. Fasman, G.D. & Gilbert, W.A. (1990) Trends Biochem. Sci. 15,

16. Nozaki, Y. & Tanford, C. (1971)J. Biol. Chem. 246,2211-2217 17. Jones. D.D. (1975) J . Theor. Bid. 50, 167-183 18. Muthusamy, R. & Ponnuswamy, P.K. (1990) Int. J . Peptide Pro-

19. Cid, H., Bunster, M., Arriagada, E. &Campus, M. (1982) FEBS

20. Meirovitch, H., Rackovsky, S. & Scheraga, H.A. (1985) Mac-

21. Argos, P. (1987) J. Mol. Biol. 193, 385-396 22. Fasman, G.D. (1989) in Prediction ofprotein Structure and Prin-

ciples ofprotein Conformdon, (Fasman, G.D., ed.), pp. 193-316, Plenum Press, New York

23. Deisenhofer, J. & Michel, H. (1989) Science 245, 1463-1473 (coordinates taken from Protein Data Bank, Brookhaven Na- tional Laboratory)

24. Weiss, M.S., Abele, U., Weckessor, J., Welte, W. & Schulz, G.E. (1991) Science 254, 1627-1630

25. Fauchere, J.L. & Pliska, V. (1983) Eur. J . Med. Chem. Chim. Theor. 18, 369-315

26. Eisenberg, D., Weiss, R.M. & Terwilliger, T.C. (1984) Proc. Natl. Acad. Sci. USA 81, 140-144

27. Bangham. J.A. (1988) Anal. Biochem. 174, 142-145 28. Klein, P.. Kanehisa, M. & DeLisi, C. (1985) Biochim. Biophys.

29. Von Heijne, G. (1986) EMBO J . 5, 3021-3027 30. Chou, P.J. & Fasman, G.D. (1974) Biochemistry 13, 222-225 31. Biou, V., Gilbert, J.F., Levin, J.M., Robson, B. & Gamier, J .

32. Finer-Moore, J . & Stroud, R.M. (1984) Proc. Natl. Acad. Sci.

33. Vogel, H., Wright, J.K. & Jahnig, F. (1985) EMBO J . 4, 3625-

34. Jahnig, F. (1990) Trend.7 Biochem. Sci. 15, 93-95 35. Ovchinnikov, Yu.A. (1982) FEBS Lett. 148, 179-191 36. Henderson. R. & Unwin, P.N.T. (1975) Nature (London) 257,

28-42 37. Widger, W.R., Crammer, W.A., Herrmann, R.G. & Trebst, A.

(1984) Proc. Natl. Acad. Sci. USA 81, 674-678 38. Tomita, M. & Marchesi, V.T. (1975) Proc. Null. Acad. Sci. USA

72. 2964-2968 39. Porter, A.G., Barber, C., Carey, N.H., Halkwell, R.A., Threlfall,

G. & Emtage, J.S. (1979) Nature (London) 282, 471-477 40. Kvist, S., Bregegere, F., Rask, L., Cami, B., Garoff, H., Daniel,

F., Wiman, K., Larhammar, D., Abastado, J.P., Gachelin, G., Peterson, P.A., Dobberstein, B. & Kourilsky, P. (1981) Proc. Natl. Acad. Sci. USA 78, 2172-2776

41. Garoff, H., Frischauf, A.M., Simons, K., Leharch, H. &Delious, H . (1980) Nature (London) 288, 236-241

42. Fields, S.. Winter, G. & Brownlee, G.G. (1981) Nature(London)

43. Rice,C.M. & Strauss, J.H. (1981)Proc. Natl. Acad. Sci. USA78,

44. Coligen, J.E., Kindt, T.J., Uehara, H., Martinko, J . & Nathen-

45. Anilionis, A,, Wunner, W.H. & Curtis, P.J. (1981) Nature (Lon-

Biophys. Biophys. Chem. 15, 321-353

351-361

cheni. 190, 207-219

89-92

rein Res. 35, 378-395

Left . 150. 247-254

romolecules 13, 1398-1405

Act0 815.468-476

(1988) Prorein Eng. 2, 185-191

C S A 81, 155-159

363 1

290, 213-217

2062-2066

sonn, S.G. (1981) Nature (London) 291, 35-39

don) 294. 275-278

Page 16: Prediction of transmembrane helices from hydrophobic characteristics of proteins

Prediction of transmembrane helices

55. Hedrick, S.M., Nielsen, E.A., Kavaler, J., Cohen, D.I. & Davis,

56. McClelland, A., Kuhn, L.C. & Ruddlc, F.H. (1984) Cell 39,

57. Wada, K., Dechcsne, C.J., Shimasaki, S., King, R.G., Kusano, K., Buonano, A., Hampson, D.R., Banner, C., Wenthold, R.J. & Nakatani, Y. (1989) Nature (London) 342, 684-689

58. Greger, P., Mano, I., Maoz, I., Mckeown, M. & Teichberg, V.I . (1989) Nature (Londonj 342, 689-692

59. Lipman, D.J., Pastor, R.W. & Lee, B. (1987) BiopoIJwers 26, 17-26

M.M. (1984) Nature (London) 308, 153-158

267-214

46. Rose, J.K., Welch, W.J., Sefton, B.M., Esch, F.S. & Ling, N.C. (1980) Proc. Natl. Acad. Sci. USA 77, 3884-3888

47. Nakashima, Y., Frangioro, B., Wiseman, R.L. & Konigsberg, W.H. (1981) J . Biol. Chem. 256, 5792-5797

48. Leonard, W.J., Depper, J.M., Kanetisa, M., Kronke, M., Peffer, N.J., Svetlik, P.B., Sullivan, M. & Greene, W.C. (1985) Science

49. Coussens, L., Yang-Feng, T.L., Liao, Y.C., Chen, E., Gray, A., McGrath, J., Seeburg, P.H., Libermann, T.A., Schlessinger, J., Francke, V., Levinson, A. & Ullrich, A. (1985) Science 230,

50. Russo, A.F. & Koshland, D.E., Jr. (1983) Science 220, 1016-

51. Ullrich, A., Coussens, L., Hayflick, J.S., Dull, T.J., Gray, A., Tam, A.W., Lee, J., Yarden, Y., Libermann, T.A., Schlessinger, J., Downward, J., Mayes, E.L.V., Whittle, N., Waterfield, M.D. & Seeburg, P.H. (1984) Nature (London) 309,418-425

52. Spiess, M. & Lodish, H.F. (1985) Proc. Natl. Acad. Sci. USA 82,

53. Boyd, A., Kendall. K. & Simon, M.I. (1983) Nature(London) 301,

54. Yamamotto, T., Davis, C.G., Brown, M.S., Schneider, W.J., Casey, M.L., Goldstein, J.L. & Russell, D.W. (1984) Cell 39, 28-37

230, 633-639

1132-1139

1020

6465-6469

623-626

Address:

Dr. P.K . Ponnuswamy Professor and Head Department of Physics Bh arathidasan University Tiruchirapalli 620 024 Tamil Nadu India

34 1