8
Proc. Natl. Acad. Sci. USA Vol. 93, pp. 13-20, January 1996 Review Principles of protein-protein interactions Susan Jones and Janet M. Thornton Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT, England ABSTRACT This review examines protein complexes in the Brookhaven Pro- tein Databank to gain a better understand- ing of the principles governing the interac- tions involved in protein-protein recogni- tion. The factors that influence the formation of protein-protein complexes are explored in four different types of protein- protein complexes-homodimeric pro- teins, heterodimeric proteins, enzyme- inhibitor complexes, and antibody-protein complexes. The comparison between the complexes highlights differences that re- flect their biological roles. numbers (e.g., 60, 180, and 240) of subunits development of algorithms to dock two (see ref. 5). proteins together (6-8), and the structural Previous work has centered on two as- characterization of protein-protein inter- pects of protein-protein recognition: the faces. Janin et al. (9), Miller (10), Argos 1. Introduction Many biological functions involve the for- mation of protein-protein complexes. In this review, only complexes composed of two components are considered. Within these complexes, two different types can be distinguished, homocomplexes and het- erocomplexes. Homocomplexes are usu- ally permanent and optimized (e.g., the homodimer cytochrome c' (1)) (Fig. la). Heterocomplexes can also have such properties, or they can be nonobligatory, being made and broken according to the environment or external factors and in- volve proteins that must also exist inde- pendently [e.g., the enzyme-inhibitor complex trypsin with the inhibitor from bitter gourd (2) (Fig. lb) and the anti- body-protein complex HYHEL-5 with ly- sozyme (3) (Fig. lc)]. It is important to distinguish between the different types of complexes when analyzing the intermolec- ular interfaces that occur within them. The division of proteins in the July 1993 Brookhaven Protein Databank (PDB) (4) into multimeric states is illustrated in Fig. 2. This distribution is biased as it reflects only those proteins whose structures have been solved and, therefore, probably overrepre- sents the small monomers. However, it is clear that trimers are relatively rare com- pared with tetramers and that the numbers of structures in the higher multimeric states fall markedly, with the obvious exception of the viral coat proteins, which contain high The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. FIG. 1. Corey-Pauling-Koltun models of protein-protein complexes. The complex compo- nents have been differentiated by color, and it should be noted that the scales are not comparable between the different structures. (a) Homodimer: cytochrome c' (PDB code 2ccy) (1). Subunit A is in yellow and subunit B is in red. (b) Enzyme-inhibitor complex: trypsin and inhibitor from bitter gourd (PDB code ltab) (2). The enzyme is in yellow and the much smaller inhibitor is in red. (c) Antibody-protein complex: HYHEL-5-lysozyme (PDB code 2hfl) (3). The light and heavy chains of the Fab are colored yellow and blue and the lysozyme is in red. 13 Downloaded by guest on January 29, 2020

Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

Proc. Natl. Acad. Sci. USAVol. 93, pp. 13-20, January 1996

Review

Principles of protein-protein interactionsSusan Jones and Janet M. ThorntonBiomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT, England

ABSTRACT This review examinesprotein complexes in the Brookhaven Pro-tein Databank to gain a better understand-ing of the principles governing the interac-tions involved in protein-protein recogni-tion. The factors that influence theformation ofprotein-protein complexes areexplored in four different types of protein-protein complexes-homodimeric pro-teins, heterodimeric proteins, enzyme-inhibitor complexes, and antibody-proteincomplexes. The comparison between thecomplexes highlights differences that re-flect their biological roles.

numbers (e.g., 60, 180, and 240) of subunits development of algorithms to dock two(see ref. 5). proteins together (6-8), and the structural

Previous work has centered on two as- characterization of protein-protein inter-pects of protein-protein recognition: the faces. Janin et al. (9), Miller (10), Argos

1. Introduction

Many biological functions involve the for-mation of protein-protein complexes. Inthis review, only complexes composed oftwo components are considered. Withinthese complexes, two different types canbe distinguished, homocomplexes and het-erocomplexes. Homocomplexes are usu-ally permanent and optimized (e.g., thehomodimer cytochrome c' (1)) (Fig. la).Heterocomplexes can also have suchproperties, or they can be nonobligatory,being made and broken according to theenvironment or external factors and in-volve proteins that must also exist inde-pendently [e.g., the enzyme-inhibitorcomplex trypsin with the inhibitor frombitter gourd (2) (Fig. lb) and the anti-body-protein complex HYHEL-5 with ly-sozyme (3) (Fig. lc)]. It is important todistinguish between the different types ofcomplexes when analyzing the intermolec-ular interfaces that occur within them.The division of proteins in the July 1993

Brookhaven Protein Databank (PDB) (4)into multimeric states is illustrated in Fig. 2.This distribution is biased as it reflects onlythose proteins whose structures have beensolved and, therefore, probably overrepre-sents the small monomers. However, it isclear that trimers are relatively rare com-pared with tetramers and that the numbersof structures in the higher multimeric statesfall markedly, with the obvious exception ofthe viral coat proteins, which contain high

The publication costs of this article were defrayedin part by page charge payment. This article musttherefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicatethis fact.

FIG. 1. Corey-Pauling-Koltun models of protein-protein complexes. The complex compo-nents have been differentiated by color, and it should be noted that the scales are not comparablebetween the different structures. (a) Homodimer: cytochrome c' (PDB code 2ccy) (1). SubunitA is in yellow and subunit B is in red. (b) Enzyme-inhibitor complex: trypsin and inhibitor frombitter gourd (PDB code ltab) (2). The enzyme is in yellow and the much smaller inhibitor is inred. (c) Antibody-protein complex: HYHEL-5-lysozyme (PDB code 2hfl) (3). The light andheavy chains of the Fab are colored yellow and blue and the lysozyme is in red.

13

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 2: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

14 Review: Jones and Thornton

700 T

600 +

500 +

.2

0

2I

Iz

400 +

300±

200 t

100+

643

148

112

11_

2 3 4 6 8

Multimeric state

FIG. 2. Multimeric states of proteins in the July 1993 PDB (4

(11), and Jones and Thornton (12) have allcompared structural properties (includinghydrophobicity, accessible surface area,shape, and residue preferences) betweeninterior, surface, and interface compo-nents in oligomeric proteins. The compar-ison of different types of complexes (en-zyme-inhibitor and antibody-antigen) interms of interface size and hydrophobicityhas also been addressed (13, 14). Morerecent work has centered on the predic-tion of interface sites using residue hydro-phobicity. Korn and Burnett (15) usedhydropathy analysis to predict the positionof the interface in a dimeric protein usinga nonautomated method. Young et al. (16)have taken this approach further and pro-duced an automated predictive algorithmbased on the analysis of the hydrophobic-ity of clusters of residues in proteins.

In this review, we study 59 differentcomplexes found in the PDB (4), whichcan be divided into four different types(Table 1).

(i) Thirty-two nonhomologous ho-modimers: proteins with two identicalsubunits.

(ii) Ten enzyme-inhibitor complexes.(iii) Six antibody-protein complexes.(iv) Eleven "other" heterocomplexes

including 4 permanent complexes and 7interfaces between independent mono-mers.These different types of complexes have

different biological roles. Most ho-modimers are only observed in the mul-timeric state, and it is often impossible toseparate them without denaturing the in-dividual monomer structures. Many ho-modimers also have twofold symmetry,which places additional constraints ontheir intersubunit relationship. Many en-

Abbreviations: PDB, Brookhaven Protein Da-tabank; ASA, accessible surface area; AASA,change in the accessible surface area; HIV,human immunodeficiency virus.

zyme-inhibistrongly asscstants ranginlmol-1 (17), yiindependentl'tion. Similarl3plexes and siplexes are c(have an indelFrom the e

homodimers,heterocomplevolved overface to suit tsome examplithe evolutiorother circum:binding. In ccinteractions aand are selstrength of thbeing subjectover many yireview, we a

interactions othe light of tiFor the cu

interfaces havchange in thearea (AASA)meric to a dircomplexes we:mentation ofalgorithm deviinterface resicthose having(0.01 A2) on c

2. Character]Interfaces

There are sevthat characteiface, which ccoordinates o

2.1. Sizeaiof protein intc

ply in absolute dimensions (A) or, moreaccurately, in terms of the AASA on com-plexation. The AASA method was used, asit is known that there is a correlation be-tween the hydrophobic free energy of trans-fer from polar to a hydrophobic environ-ment and the solvent ASA (22). Thus, cal-culating AASA may provide a measure ofthe binding strength. The shape of the in-terfaces is also analyzed, as this is relevant todesigning molecular mimics.The mean AASA on complexation (going

from a monomeric state to a dimeric state)was calculated as half the sum of the totalAASA for both molecules for each type of

16 22 complex (Table 2). To give a guide of how5 1 much of a protein subunit's surface is buried

~ on complexation, the AASA values for in-12 16 24 >100 dividual complexes were compared with the

molecular weights of the constituent sub-units (Fig. 3). For the heterocomplexes, the

). 1 = monomer, 2 = dimer, etc. molecular weights will be different for each

itor complexes are also component and hence the smaller compo-)ciated, with binding con-

nent was used, as this will limit the maxi-g frolo-7Mol- to 0-13mum size of the interface.

g from 10 mole to 10e13 In the homodimers, the AASA varieset these molecules also exist widely from 368 A2 to 4746 A2, and thereytas stable entities in solu- is a clear, though scattered, relationshipyo,the antibody-protein com- with the molecular weight of the subunitLx of the other heterocom- [correlation coefficient (r) is 0.69], withmposed of molecules that the larger molecules in general havingpendent existence. larger interfaces. The range of AASA involutionary perspective, the the heterocomplexes is smaller (639 A2 toenzyme-inhibitors, and the 3228 A2). This constancy presumably re-exes have presumably all flects three factors: the limited nature oftime to optimize the inter- the PDB, the average size of protein do-their biological function. In mains, and the biological constraints. Ines, the function may require addition, it should be noted that all then of strong binding, while enzyme-inhibitor complexes involve pro-stances may dictate weaker teinases, and, with the exception of papainIntrast, the antibody-protein and subtilisin, all are related to trypsin,ire relatively "happenstance" although the corresponding inhibitors arelected principally by the nonhomologous.ie binding constant, without In Fig. 3b it can be seen that threeto evolutionary optimization heterocomplexes [cathespin D (PDB codeears. Thus in the following llya), reverse transcriptase (PDB codeLttempt to characterize the 3hvt), and human chorionic gonadotropinbserved between proteins in (PDB code lhrp)] have relatively largeheir biological function. interfaces for their molecular weightsirrent work, protein-protein These three are all permanent complexes,re been defined based on the and the size of the interfaces in these-ir solvent accessible surface structures is more comparable with thewhen going from a mono- distribution observed in the homodimers

neric state. The ASAs of the (Fig. 3a) than the heterocomplexes.re calculated using an imple- Two protein subunits may interact andthe Lee and Richards (20) form a protein-protein interface with twoeloped by Hubbard (21). The relatively flat surfaces or form a twistedlues (atoms) were defined as interface. To assess how flat or howkSAs that decreased by >1 A2 twisted the protein-protein interfaces-omplexation. were, a measure of how far the interface

residues deviated from a plane (termedization of Protein-Protein planarity) was calculated. The planarity of

the surfaces between two components of acomplex was analyzed by calculating the

reral fundamental properties rms deviation of all the interface atomsrize a protein-protein inter- from the least-squares plane through thevan be calculated from the atoms. Fig. 4 shows that the heterocom-f the complex. plexes have interfaces that are more pla-id Shape. The size and shape nar than the homodimers. The highererfaces can be measured sim- mean rms deviation of the homodimers

0

Proc. Natl. Acad. Sci. USA 93 (1996)

7 5. - I

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 3: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

Proc. Natl. Acad. Sci. USA 93 (1996) 15

Table 1. Data sets of protein-protein complexes

PDB Resolution,code Protein A

Nonhomologous homodimers*lcdt Cardiotoxin 2.5lfcl Fc fragment (immunoglobulin) 2.9lil8 Interleukin NMRlmsb Mannose binding protein 2.3lphh p-Hydroxybenzoate hydrolase 2.3lpp2 Phospholipase 2.5lpyp Inorganic pyrophosphatase 3.0lsdh Hemoglobin (clam) 2.4lutg Uteroglobin 1.35lvsg Variant surface glycoprotein 2.9lypi Triose phosphate isomerase 1.92ccy Cytochrome c3 1.672cts Citrate synthase c 2.02gnS Gene 5 DNA-binding protein 2.32orl 434 repressor 2.52rhe Bence-Jones protein 1.62rus Rubisco 2.32rve EcoRV endonuclease 3.02sod Superoxide dismutase 2.02ssi Subtilisin inhibitor 2.62tsl Tyrosyl transferase RNA synthase 2.32tsc Thymidylate synthase 1.972wrp Trp repressor 1.653aat Aspartate aminotransferase 2.83enl Enolase 2.253gap Catabolite gene activator protein 2.53grs Glutathione reductase 1.543ied Isocitrate dehydrogenase 2.53sdp Iron superoxidase 2.14mdh Cytoplasmic malate dehydrogenase 2.55adh Alcohol dehydrogenase 2.95hvp HIV protease 2.0

Enzyme-inhibitor complexestlach a-Chymotrypsin-eglin C 2.0Icho a-Chymotrypsin-ovomucoid third domain 1.8lcse Subtilisin Carlsberg-eglin C 1.2lmct Trypsin-inhibitor from bitter gourd 1.6lmcc Peptidyl peptide hydrolase-Eglin C 2.0lstf Papain-inhibitor stefin B mutant 2.37Itab Trypsin-Bowman-Birk inhibitor 2.3ltgs Trypsinogen-Pancreatic secretory trypsin inhibitor 1.82ptc ,B-Trypsin-pancreatic trypsin inhibitor 1.92sic Subtilisin-streptomyces subtilisin inhibitor 1.8

Antibody-antigen complexestlfdl D1.3 Fab-hen egg white lysozyme 2.5ljel Fab JE142-histidine containing protein 2.8ljhl D11.15 Fv-pheasant egg lysozyme 2.4lnca NC41 Fab/influenza virus N9 neuraminidase 2.52hf 1 HYHEL-5 Fab-chicken-lysozyme 2.543hfm HYHEL-10 Fab-chicken lysozyme 3.0

Other heterodimeric complexes§latn Deoyribonuclease I-actin 2.8Igln Glycerol kinase-glucose-specific factor III 2.6lhrps Human chorionic gonadotropin 3.0llpa Lipase-colipase 3.04Ilya$ Cathepsin D 2.52btf ,3-Actin-profilin 2.552pch Yeast cytochrome c peroxidase-horse cytochrome c 2.83hhrlt Human growth hormone-human growth hormone receptor 2.83hvtl Reverse transcriptase 2.96rlx¶,** Relaxin 1.5

*Data set of 32 nonhomologous homodimers. Protein dimers were selected for inclusion on the basisthat they had a sequence identity of <35% and were structurally different. The structural similarityof the proteinswas measured using a method of direct structural alignment, SSAP (18). Proteinswereselected for the data set if they had a SSAP score of '80. In the process of selection, only dimerswith identical subunits were considered. This selection resulted in a nonhomologous data set of 32protein dimers, each belonging to a different homologous protein family.tData set of 10 enzyme-inhibitor complexes. These heterocomplexes were selected from the PDBsuch that, although the enzymes components could be homologous, the corresponding inhibitorswere nonhomologous (for definition of nonhomologous see footnote *) or vice versa.tData set of six antibody-protein complexes. Although homologous pairs are included (e.g., fourantibody-lysozyme complexes), the sites of recognition on the lysozyme are different.§Data set of other heterocomplexes. This data set contains those complexes selected from thePDB which did not fit into either of the other heterocomplex categories. This data set includesfour structures that occur only in the complexed form and six structures that occur as bothcomplexes and as monomers.Occur only as heterodimers.IContributes two hormone-receptor complexes.**Derived from a single chain precursor.

results from five proteins that had com-Earatively high rms deviation values (>6A). These are dimers in which the twosubunits were twisted together across theinterface [e.g., isocitrate dehydrogenase(26)] or proteins that had subunits with"arms" apparently clasping the two halvesof the structure together [e.g., aspartateaminotransferase (28)] (Fig. 5). When theother heterocomplex data set is dividedinto structures that occur only as het-erodimers and those that occur as bothheterocomplexes and monomers (Table2), it becomes apparent that the formerresemble the homodimers in that they areless planar compared to their nonperma-nent counterparts, which occur as bothmonomers and as dimer complexes.To provide a rough guide to the shape

of the interface, the "circularity" of theinterfaces was calculated as the ratio ofthe lengths of the principal axes of theleast-squares plane through the atoms inthe interface. A ratio of 1.0 indicates thatan interface is approximately circular. Theshape of the interface region (Table 2)varies little between the homodimers, theantigens, and the enzyme component ofthe enzyme-inhibitor complex; each typeis relatively circular with an average ratioof between 0.71 and 0.75. In comparison,the inhibitors of the enzyme-inhibitorcomplexes have less circular interfaces,with an average ratio of 0.55. The ho-modimers show the largest variation, withthe elongated interface of variant surfaceglycoprotein of Trypanosoma brucei (29)at one extreme (ratio 0.25). The interfaceof this structure, which forms a coat on thesurface of the parasite, reflects the elon-gated nature of the protein as a whole.

2.2. Complementarity Between Sur-faces. Many authors have commented onthe electrostatic and the shape comple-mentarity observed between associatingmolecules (5, 30-33). The electrostaticcomplementarity between interfaces hasbeen used as an additional filter for manyprotein-protein docking methods (see, forexample, ref. 34) and new methods ofevaluating shape complementarity havebeen evolved (see, for example, ref. 33).

In this review, the complementarity ofthe interacting surfaces in the protein-protein complexes has been evaluated bydefining a gap index:

Gap index (A)=gap volume between molecules (A3)/interface ASA (A2) (per complex).

[1]

The gap volume was calculated using aprocedure developed by Laskowski (23),which estimates the volume enclosed be-tween any two molecules, delimiting theboundary by defining a maximum alloweddistance from both interfaces. A mean gapindex was calculated for each type ofcomplex (Table 2). The results indicatethat the interacting surfaces in the ho-

Review: Jones and Thornton

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 4: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

16 Review: Jones and Thornton

Table 2. Results of structural analysis on protein-protein complexes

All Otherhetero- Enzyme- Antibody- hetero-

Characteristic Homodimer complexes inhibitor protein complexes

No. of examples 32 27 10 6 7* 4tAASA,t A2Mean 1685.03 983.06 785.10 777.42 848.67 2021.59ar 1101.09 582.03 74.52 135.33 243.63 1036.64

Planarity,§ AMean 3.46 2.80 2.70 2.21 2.54 3.94ar 1.72 0.87 0.45 0.39 0.61 1.35

Circularity$ E I Ag BS BSMean 0.71 0.73 0.55 0.75 0.61 0.64af 0.17 0.05 0.10 0.12 0.16 0.23Minimum 0.25 0.62 0.43 0.62 0.41 0.30Maximum 1.00 0.78 0.71 0.92 0.91 0.93

Segmentationl' E I Ag BS BSMean 5.22 7.8 2.7 3.83 4.64 5.63af 2.55 1.03 0.95 1.83 1.28 3.85Minimum 2 6 2 2 3 1Maximum 11 9 5 7 7 11

Hydrogen bonds per100 A2 AASA**Mean 0.70 1.13 1.37 1.06 0.85 1.10cr 0.46 0.47 0.37 0.51 0.37 0.61Minimum 0 0.29 0.60 0.47 0.39 0.29Maximum 1.7 1.88 1.87 1.88 1.47 1.65

Gap index,tt AMean 2.20 2.48 2.20 3.02 3.02 1.47af 0.87 1.02 0.47 0.80 1.13 1.34Minimum 0.57 0.35 1.38 2.04 2.03 0.35Maximum 4.43 5.17 2.86 3.96 5.17 3.42

Hydrophobicityl4Mean interior +0.26 +0.26 +0.25 +0.28 +0.29 +0.19Mean interface +0.12 -0.14 -0.03 -0.22 -0.26 -0.05Mean exterior -0.27 -0.23 -0.21 -0.24 -0.27 -0.19

E, enzyme; I, inhibitor; Ag, antigen; BS, both subunits.*Occur as monomers and as heterodimer complexes.tOccur only as heterodimers.t:AASA for one subunit on complexation. In the heterodimer data sets, enzyme-inhibitor andother heterocomplexes, the ASA shown is the mean ASA buried by each subunit. In theantibody-protein data set, the ASA is the ASA on the antigen (protein) surface buried oncomplexation.§A program [implemented by Laskowski (23)] was used to calculate the atomic rms deviation ofall the interface atoms from the least-squares plane through all these atoms.IThe circularity of the interfaces was calculated as the ratio of the lengths of the principal axisesof the least-squares plane through the atoms in the interface. The standard deviation (or) andrange of the distribution are also shown.

lIt was defined that interface residues separated by more than five residues were allocated todifferent segments. The standard deviation (a-) and the range of the distribution are also shown.**The number of intermolecular hydrogen bonds per 100 A2 AASA were calculated using HBPLUS

(24) in which hydrogen bonds are defined according to standard geometric criteria. Thestandard deviation (a-) and range of the distribution are also shown.

ttThe gap volumes between the two components of the complexes was calculated using SURFNET(18).

#fMean hydrophobicity values [derived using the scale of Janin et al. (9) based on statisticalanalysis of protein structures] for three subsets of residues within each type of protein-proteincomplex. The interface residues were defined as explained in section 2. The definition ofexterior and interior residues were based on relative ASA of each residue, which range from0% for residues with no atom contact with the solvent to 100% for fully accessible residues. Onthis basis, the exterior residues were defined as having relative accessibilities >5%, and interiorresidues were defined as those with relative accessibilities s5%. This 5% cut-off was devisedand optimized by Miller et al. (47). The subset of interfaces residues were excluded from thesubsets of interiors and exteriors, resulting in three discrete sets of residues for each of thecomplexes.

modimers, the enzyme-inhibitor com- other heterocomplexes are the least com-plexes, and the permanent heterocom- plementary (although all four distribu-plexes (a subset of the other hetero- tions do overlap considerably). These datacomplex data set) are the most agree with the conclusions drawn by Law-complementary, whereas the antibody- rence and Colman (33), using their shapeantigen complexes and the nonobligatory complementarity statistic.

2.3. Residue Interface Propensities.The relative importance of different aminoacids residues in the interfaces of complexescan give a general indication of the hydro-phobicity. Such information can only beinterpreted if the distribution of residuesoccurring in the interface are comparedwith the distribution of residues occurringon the protein surface as a whole. Residueinterface propensities were calculated foreach amino acid (AAj) as the fraction ofASA that AA- contributed to the interfacecompared with the fraction of ASA thatAA. contributed to the whole surface (ex-terior residues plus interface residues) i.e.,

Interface residue propensity AA. =

Ni Ni

JASAAA.(i)/ Ea ASA(j))

[2]

where XASAAA,(i) = sum of the ASA (inthe monomer) of amino acid residues oftypej in the interface, 1ASA(i) = sum ofthe ASA (in the monomer) of all aminoacid residues of all types in the interface,Y:ASAAA.(S) = sum of the ASA (in themonomer) of amino acid residues of typej on the surface (exterior plus interfaceresidues), and EASA(s) = sum of the ASA(in the monomer) of all amino acid resi-dues of all types on the surface.A propensity of >1 denotes that a residue

occurs more frequently in the interface thanon the protein surface. The propensities(Fig. 6) show that, with the exception ofmethionine, the hydrophobic residues showa greater preference for the interfaces ofhomodimers than for those of heterocom-plexes. The lower propensities for hydro-phobic residues in the heterocomplex inter-faces is balanced by an increased propensityfor the polar residues.

2.4. Hydrophobicity Including Hydro-gen Bonding. It has often been assumed thatproteins will associate through hydrophobicpatches on their surfaces. However, polarinteractions between subunits are also com-mon and, in terms of the driving force forcomplexation, it is important to explore therelative contributions of these effects, in-cluding reference to the subunits' ability toexist independently.A mean hydrophobicity value [based on

the scale derived by Janin et al. (9)] wascalculated for all residues defined in theinterface of each complex. A mean valuewas calculated for each type of complex andfor all heterocomplexes (Table 2). In all ofthe complexes, the interface has an inter-mediate hydrophobicity between those ofthe interior (hydrophobic) and the exterior(hydrophilic). When the hydrophobicityvalues of the interface are compared be-tween the homodimers and the heterocom-plexes, it is seen that, as previously con-cluded from the residue propensities, the

Proc. Natl. Acad. Sci. USA 93 (1996)

Ns Ns

YaASAAA,(s)/ YaASA(s) ,i=l i=l

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 5: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

Proc. Natl. Acad. Sci. USA 93 (1996) 17

500T a

4000+

U

-m,i3000+.

,,

2000 +

0 10000

3000±

2000+

1000+

.

0

.U

* *

20000

*

.

a

30000 40000 50000

° 3hvtO lhrp

-.0~~~A

0 10000 20000 30000Molecular weight

40000 50000

FIG. 3. (a) Interface ASA vs. molecular weight for homodimers. The ASA (measured inis that buried by one subunit on dimerization, and the molecular weight is that of the monorThe dashed line is the straight line regression (r = 0.69). (b) Interface ASA vs. molecular wefor heterocomplexes. The AASA (measured in A2) and the molecular weight are both fromsmallest subunit. The dashed line is the straight line regression for all heterocomplexes (r = 0.

interfaces of the heterocomplexes are less This difference in hydrophobicity, wIhydrophobic than those of homodimers. has previously been observed (9, 13), car

Homodimers

I A A~~~~~~~~~~~~~~~~~~~~~~IlIl

EnMeinW-bitor ,Anboy Other heterocomplexesproteiD,

Ill--E

ii ppprrrrfppp r rr r rrrrrri rp iiiiiiii.i,Ai*i*i*i*i* Ia~ ~ ~ ~ P P l. I.

Protein interfaces

FIG. 4. Planarity of protein-protein interfaces. The rms deviation of atoms fromleast-squares plane through these atoms is shown for one subunit of the homodimers, for tsubunits of the enzyme-inhibitor complexes and other heterocomplexes, and for the antisubunit of the antibody-antigen complexes. Within each group of complexes, the proteins Ebeen placed in ascending order of rms deviation. The mean of each data set is indicated by a shorizontal line. Each bar represents one interface for a single protein.

explained by the roles of the two types ofcomplex. The homodimers rarely occur orfunction as monomers, and hence their hy-drophobic surfaces are permanently buriedwithin a protein-protein complex. Of the 27heterocomplexes analyzed in this review, 23(10 enzyme-inhibitor complexes, 6 anti-body-protein complexes, and 7 other het-erocomplexes) do occur as monomers insolution and have biological functions in thisstate. Hence these interfaces cannot be ashydrophobic as those of the homodimers,because a large exposed hydrophobic patchon the protein would be energetically unfa-vorable.To identify the major polar interactions

between the components in the complexes,the mean number of hydrogen bonds per100 A2 of AASA was calculated for eachtype of complex (Table 2). The 23 hetero-complexes that occur as both monomers andcomplexes have relatively more intermolec-ular hydrogen bonds per AASA. The fourheterocomplexes that occur only as het-erodimers show a similar numbers of hydro-gen bonds per 100 A2 of AASA as thehomodimers. This distribution was expectedfrom the residue propensities, which showedthat the transient complexes (those withcomponents that occur as both monomersand complexes) contained more hydrophilicresidues in their interfaces than the perma-nent complexes.

2.5. Segmentation and SecondaryStructure. The number of discontinuous

A segments of the polypeptide chain in-M00 volved in the interface is important since

the ability of peptides or small molecules2 to mimic one-half of the interaction may

mAe) depend upon it. For example, in an inter-nehrt face that is dominated by one segment, aighe single peptide will probably be a good17). mimic. However, the design of molecules

to mimic multisegmented interfaces willhich almost certainly be more difficult.1iCh To analyze the discontinuous nature ofn be the interfaces, in terms of the amino acid

sequence, the mean number of segments inthe interfaces was calculated for each typeof complex (Table 2). It was defined thatinterface residues separated by more than 5residues were allocated to different seg-ments. In the 59 complexes studied, thenumber of segments varies from 1 to 11. Infact, only 1 complex [relaxin (35)] had onesegment at the interface, as it is a very smallprotein derived from a single chain precur-sor, with only 24 residues in the a-chain and29 in the (3-chain. The enzyme-inhibitorcomplexes are unusual in having only two tofive segments interacting. This class of in-hibitors has evolved to mimic an elongatedsegment of polypeptide chain, in the con-formation required for cleavage by the en-zyme, and therefore all present a protruding

the canonical loop structure (36, 37), whichboth dominates the interaction. In contrast theigen other interfaces are highly segmented, es-have pecially the long binding site cleft in thesolid proteinases, which on average contains

seven segments.

0U

0

1000 t

000

4000

aa.O..9,m*.*UUU

b* Antibody-protein

o Enzyme-inhibitor

o Other Heterocomplexes

O lya

Ce~

C)

C

0

9-

8I

7

EB 6S

4

1; 3

2 lliiIII lllllFIFF

IL. I

Review: Jones and Thornton

n - m m 0 m 'm .

W . - - .

A0

I

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 6: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

18 Review: Jones and Thornton

FIG. 5. Corey-Pauling-Koltun models of planar and nonplanar interfaces in protein com-plexes. (Upper) Two subunits are shown: one subunit is colored blue and one red. The interfaceatoms in each subunit are colored differently; the atoms in green are the interface atoms in theblue subunit and those in yellow are the interface atoms in the red subunit. (Lower) Only theinterface atoms of the two structures are shown. (Upper) Mannose binding protein (PDB codelmsb) (27): a planar interface. (a) Dimer viewed looking along the subunit interface. (b) Dimerinterface only shown. (Lower) Isocitrate dehydrogenase (PDB code 3icd) (26): a nonplanarinterface. (a) Dimer viewed looking down the subunit interface showing the two subunits twistedtogether at the top. (b) Dimer interface only shown, viewed along the interface.

The secondary structure of the interface approximately equal proportion of helical,regions has also been analyzed. Over the strand, and coil residues involved. Somewhole data set, it was found that there is an interfaces contain only one type of structure

3 - Homodimers 0 Heterocomplexes

2

ALA ILE LEU MET PHE VAL PRO GLY ASN CYS GLN HIS SER THR TRP TYR ARG ASP GLU LYS

Residues

FIG. 6. Residue interface propensities were calculated for each amino acid (AAj) based on thefraction of ASA that AAj contributed to the interface compared with the fraction of ASA thatAAj contributed to the whole surface (exterior residues plus interface residues) (see section 2.2).

(helices, strands, or loops), but most aremixed. The interfaces involving f3 sheets fallinto three categories, those that interact byextending the sheet through classic main-chain hydrogen bonding [e.g., human im-munodeficiency virus (HIV) protease (38)],those in which the sheets stack on top of oneanother [e.g., subtilisin inhibitor homodimer(39)], and mixed structures in which the 13sheets are neither clearly stacked nor ex-tended [e.g., copper, zinc, and superoxidedismutase (40)].

2.6. Conformational Changes on Com-plex Formation. It is not clear to whatextent proteins change their conformationon forming a complex (36, 41, 42), andcurrently there are few proteins that havebeen structurally determined (by crystallog-raphy or nuclear magnetic resonance) be-fore and after complexation. However, it ispossible to distinguish various levels of con-formational change: no change, side chainmovements alone, segment movement in-volving the mainchain (e.g., hinged loop),and domain movements (gross relativemovements of the domains). The mecha-nism of domain movement is specificallyrelevant to enzyme complexes, which oftenundergo domain shifts when binding sub-strates [e.g., adenylate kinase (43) and lacto-ferrin (43)]. For antibody-protein recogni-tion, there is a wide range of variation thatcan occur on binding (41, 42, 45, 46). Overallwe can expect that both rigid and flexibledocking will occur in different circum-stances, but there will always be an energeticprice to pay for reducing flexibility.

3. Patch Analysis of Protein Surfacesin Homodimers

So far we have analyzed the interface re-gions in isolation, but it is also instructive toexplore whether these regions are signifi-cantly different from the rest of the proteinsurface in any way. The problem to beaddressed is given a protein of known struc-ture (but with no known structure for itscomplex) is itpossible to identify the interfaceregion on its surface? Here we use the mono-mer structures of the homodimers and com-pare their surface residue patches.A patch is defined as a central surface-

accessible residue with n nearest surface-accessible neighbors, as defined by Co posi-tions, where n is taken as the number ofresidues observed in the known homodimerinterface. A number of constraints was usedto ensure that the residues selected in apatch represented a contiguous patch on thesurface of the protein.

This procedure defines a number ofoverlapping patches of accessible residues.For example, in the HIV protease struc-ture (PDB code Shvp) (38), there are 81such patches. Each possible surface patchwas then analyzed for a series of param-eters including, residue propensity (sec-tion 2.3). ASA, protrusion index (44),planarity (section 2.1), and hydrophobic-

Proc. Natl. Acad. Sci. USA 93 (1996)

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 7: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

Proc. Natl. Acad. Sci. USA 93 (1996) 19

Protrusion index22

C

Ucoa a

0. 0.

o oli

6 ~~~~~~~~~~~6z ~~~~~~~z

4.42 P In5.49deP index

Residue propensities

uz

0.0

6z

Hydrophobicity

Hydrophobicity

ASA

cn

0

Mean ASA

ity (section 2.4). These parameters werealso evaluated for the known residue in-terfaces. Thus, for each parameter thedistribution ofvalues for all the patches onone protein, including the observed inter-face patch, can be plotted [for exampleHIV protease (PDB code Shvp) (38); Fig.7]. A ranking of the true interface patchrelative to the other possible patches (e.g.,top 10%, 10-20%, etc.) was then calcu-lated. With this approach, it becomes pos-sible to plot the rankings of all the ob-served patches for each protein as a his-togram (Fig. 8) to assess which parametersbest differentiate the interface region.The aim is to identify likely recognitionsites from a structure for which structuraldata on the complex is not available.

It can be seen that no single parameterabsolutely differentiates the interfacesfrom all other surface patches. For exam-

ple, with the planarity parameter, 50% ofthe interfaces were in the most planar bin

FIG. 7. Distribution of parameters for allpatches in HIV protease (38) (PDB code 5hvp).Distributions are shown for rms deviation ofatoms from the least-squares plane through theatoms (a), interface residue propensities (b),protrusion index (c), hydrophobicity [based onthe scale ofJanin et al. (9)] (d), and ASA (e). Oneach graph, all the surface patches are repre-sented by the shaded bars and the observedinterface patch is represented by the black bar.Relative rankings were calculated from thesedata. For example, with the ASA data (e), theknown interface patch (indicated in black)ranks in the top 10% of the distribution.

(i.e., among the top 10% of patches thatwere most planar), but others were verynonplanar (see section 2.1). The moststriking correlation is for the accessiblesurface area (Fig. 8e). This observation inpart reflects the fact that the side chainsfrom one monomer extend from the sur-face to interact with the other half of thedimer. In isolation, therefore, they be-come highly accessible, and we would notexpect to see such a strong signal for thestructure of an isolated molecule prior tocomplexation, as the side chains probablychange their conformation and "stretchout" to form the complex. As expectedfrom the accessibility data, the interfacestend to protrude from the surface (Fig.8c), although the signal is weaker, perhapsas a consequence of the requirements forplanarity. Of course some recognition re-

gions are more concave (e.g., the anti-body-combining site), but for the ho-modimers the general trend is to favor

protrusion. Similarly the residue propen-sities (Fig. 8b) show some discriminatingpower, suggesting that the index doescarry relevant information, although thetrend is not as marked as for some of otherof the parameters. The weakest correla-tion can be seen for the "hydrophobicity"measure (Fig. 8d) derived from the Janinet al. (9) parameters, although even herethere is some suggestion that the interfacepatch tends toward the hydrophobic.None of the distributions are definitive

in that their interface region is neveralways at one extreme, but they all showtrends for the known interface to be dis-tinguished from other surface patches.This type of comparative analysis, includ-ing many different parameters rather thana single value, can potentially be used topredict the location of likely interface siteson protein surfaces.For a protein that is known to be in-

volved in protein-protein interactions andwhose structure has been determined butfor which there is no structure for thecomplex available, it is straightforward toanalyze the surface patches and calculatetheir properties as shown in Fig. 8. Foreach patch we can calculate a combinedprobability that it will be involved in form-ing an interface to another protein mole-cule. These probabilities can be rank-ordered to identify putative interfaces. Us-ing this method for the homodimers, we canidentify >70% of the interface regions cor-rectly. Such an approach is useful for iden-tifying candidate interface residues, whichcan be mutated experimentally and testedfor the effect on complex formation.

4. Discussion

This review has highlighted the need totake into account the type of protein-protein complexes (as shown in Table 1)when characterizing the interfaces withinthem. Complexes can be permanent ornonobligatory. The requirement for themolecules to exist as independent entitiesimposes additional constraints on thesestructures, and their interfaces are lesshydrophobic than those that only exist ina multimeric form. In addition, it wasfound that the permanent complexes hadprotein-protein interfaces that were moreclosely packed but less planar and withfewer intersubunit hydrogen bonds thanthe nonobligatory complexes.The results presented here are derived

from a relatively small data set of proteincomplexes. This analysis has been difficultbecause of the lack of information on the invivo complex status in the current PDBentries, so that extracting all dimers, forexample, is a very labor-intensive process. Itis also important to recognize related com-plexes, so that a data set is not biased.Clearly this work needs to be extended. Asthe data base grows rapidly, we would like toinclude higher order complexes and such

rms of least-squares plane

a.0026z8

Review: Jones and Thornton

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020

Page 8: Review Principles of protein-protein interactions · rize a protein-protein inter- from the least-squares plane through the van be calculated from the atoms. Fig. 4 shows that the

20 Review: Jones and Thornton

rms of least-squares plane

Nonplanara

.E'e

co

10 20 30 40 50 60 70 80 90 100Rank ordering bins

Protrusion index

60-1

50-

e 40-

5S30-

"U.20-

60-

50-

30*S 30-

20-

10

Low protrusion High protrusionC

10 20 30 40 50 60 70 80 90 100Rank ordering bins

ASA

Low ASA

10 20 30 40 50 60 70 80Rank ordering bins

factors as interdigitation, con:change on complex formation, ation with binding constants. Thoften difficult to determine expand are almost never depositecoordinates, yet they are essentito understand the kinetics andnamics of complex formation.What is clear is that over ti

years, there will be a cascade ofdata for protein-protein interawill almost certainly see moretory complexes, with weaker iras these are often of great bioevance. In nature many of thiportant biological functions inmulticomponent complexes (ebosome), and we are only justfirst steps to understand the pmolecular recognition in simpHowever, the implications of a

Residue propensil60-

50-

a40§ 40-

30

o 20

10

50-

40-

30-

o 20-

10

Low propensity

10 20 30 40 50 60 70Rank ordering bin

Hydrophobicity

Less hydrophobic More

ties

High propensityb

S

80 90 100IS

hydrophobicd

10 20 30 40 50 60 70 80 90 100Rank ordering bins

FIG. 8. Patch analysis distributions: rank or-dering of observed interface patches relative to

High ASA other patches on the surface of the protein. Fore each protein, the interface patch is ranked, rel-

ative to all other surface patches, as being in thetop 10%, 10-20% etc. (see Fig. 7). The 32observations (one for each homodimer) are com-bined for each parameter separately. The distri-butions shown are rms deviation of atoms fromthe least-squares plane through the atoms (0-10% are the most planar interfaces) (a), interfaceresidue propensities (b), protrusion index (c),hydrophobicity [based on the scale of Janin et al.(9)] (d), and ASA (e). A mean ASA for residues

90' 100 in each patch was calculated and used in the rankordering.

formational derstanding for the design of new thera-tnd correla- peutics and environmental products arete latter are apparent to all. The next few years prom-)erimentally ise much excitement as we discover mored with the about how proteins interact together toial if we are perform their biological function.thermody-

S.J. is funded by the Biotechnology and Bi-ie next few ological Research Council and Zeneca Phar-coordinate maceuticals.tctions. Wenonobliga- 1. Finzel, B. C., Weber, P. C., Hardman, K. D. &

Salemme, F. R. (1985)J. Mol. Biol. 186, 627-643.iteractions, 2. Tsunogae, Y., Tanaka, I., Yamane, T., Kikkawa,

)logical rel- J. I., Ashida, T., Ishikawa, C., Watanabe, K.,e most im- Nakamura, S. & Takahashi, K. (1986)J. Biochem.moslm- ~(Tokyo) 100, 1637-1645.ivolve huge 3. Sheriff, S., Silverton, E. W., Padlan, E. A., Co-

:.g., the ri- hen, G. H., Smith-Gill, S. J., Finzel, B. C. &Davies, D. R. (1987) Proc. Natl. Acad. Sci. USA

taking our 84, 8075-8092.

trinciples of 4. Bernstein, F. C., Koetzle, T. F., Williams,)le systems. G. J. B., Meyer, E. F., Brice, M. D., Rodgers,J. R., Kennard, O., Shimanouchi, T. & Tasumi,a better un- M. (1977) J. Mol. Biol. 112, 535-542.

5. Johnson, J. E. (1996) Proc. Natl. Acad. Sci. USA92, 27-33.

6. Walls, P. H. & Sternberg, M. J. E. (1992) J. Mol.Biol. 228, 277-297.

7. Helmer-Citterich, M. & Tramontano, A. (1994)J. Mol. Biol. 324, 1021-1031.

8. Zielenkiewicz, P. & Rabczenko, A. (1988) Bio-phys. Chem. 29, 219-224.

9. Janin, J., Miller, S. & Chothia, C. (1988)J. Mol. Biol. 204, 155-164.

10. Miller, S. (1989) Protein Eng. 3, 77-83.11. Argos, P. (1988) Protein Eng. 2, 101-113.12. Jones, S. & Thornton, J. M. (1995) Prog. Biophys.

Mol. Biol. 63, 31-65.13. Janin, J. & Chothia, C. (1990) J. Biol. Chem. 265,

16027-16030.14. Duquerroy, S., Cherfils, J. & Janin, J. (1991) Ciba

Found. Symp. 161, 237-252.15. Korn, A. P. & Burnett, R. M. (1991) Proteins:

Struct. Funct. Genet. 9, 37-55.16. Young, L., Jernigan, R. L. & Covell, D. G. (1994)

Protein Sci. 3, 717-729.17. Laskowski, M. & Kato, I. (1980) Annu. Rev.

Biochem. 49, 593-626.18. Taylor, W. R. & Orengo, C. A. (1989) J. Mol.

Biol. 208, 1-22.19. Wells, J. A. (1996) Proc. Natl. Acad. Sci. USA 93,

1-6.20. Lee, B. & Richards, F. M. (1971)J. Mol. Biol. 55,

379-400.21. Hubbard, S. J. (1992) PhD. thesis (Univ. of Lon-

don, London, England).22. Chothia, C. (1974) Nature (London) 248, 338-

339.23. Laskowski, R. A. (1991) SURFNET computer pro-

gram (Department of Biochemistry and Molec-ular Biology, University College, London, En-gland).

24. Thornton, J. M., Edwards, M. S., Taylor, W. R. &Barlow, D. J. (1986) EMBO J. 5, 409-413.

25. McDonald, I. K. & Thornton, J. M. (1994)J. Mol.Biol. 238, 777-793.

26. Hurley, J. H., Thorness, P. E., Ramalingam, V.,Helmers, N. H., Koshland, D. E. & Stroud, R. M.(1989) Proc. Natl. Acad. Sci. USA 86, 8635-8639.

27. Weis, W. I., Kahn, R., Fourme, R., Drickamer,K. & Hendrickson, W. A. (1991) Science 254,1608-1615.

28. Smith, D. L., Almo, S. C., Toney, M. D. & Ringe,D. (1989) Biochemistry 28, 8161-8167.

29. Freymann, D., Down, J., Carrington, M., Roditi,I., Turner, M. & Wiley, D. (1990) J. Mol. Biol.216, 141-160.

30. Chothia, C. & Janin, J. (1975) Nature (London)256, 705-708.

31. Morgan, R. S., Miller, S. L. & McAdon, J. (1979)J. Mol. Biol. 127, 31-39.

32. Connolly, M. L. (1986) Biopolymers 25, 1229-1247.

33. Lawrence, M. C. & Colman, P. M. (1993) J. Mol.Biol. 234, 946-950.

34. Vakser, I. A. & Aflalo, C. (1994) Proteins: Struct.Funct. Genet. 20, 320-329.

35. Eigenbrot, C., Randal, M., Quan, C., Burnier, J.,O'Connell, L., Rinderknecht, E. & Kossiakoff,A. A. (1991) J. Mol. Biol. 221, 15-21.

36. Huber, R. (1979) Trends Biochem. Sci. 4, 271-276.

37. Hubbard, S. J., Thornton, J. M. & Campbell,S. F. (1992) Faraday Disc. 43, 13-23.

38. Navia, M. A., Fitzerald, P. M. D., Mckeever,B. M., C., L., Heimbach, J. C., Herber, W. K.,Sigal, 1. S., Darke, P. L. & Springer, J. P. (1989)Nature (London) 337, 615-620.

39. Mitsui, Y., Satow, Y., Watanabe, Y., Hirono, S.& Iitaka, Y. (1979) Nature (London) 277, 447-452.

40. Tainer, J. A., Getzoff, E. D., Beem, K. M., Ri-chardson, J. S. & Richardson, D. C. (1982) J.Mol. Bio!. 160, 181-217.

41. Wilson, I. A. & Stanfield, R. L. (1993) Curr.Opin. Struct. Biol. 3, 113-118.

42. Wilson, I. A. & Stanfield, R. L. (1994) Curr.Opin. Struct. Biol. 4, 857-867.

43. Gerstein, M., Schulz, G. & Chothia, C. (1993) J.Mol. Biol. 229, 494-501.

44. Gerstein, M., Anderson, B. F., Norris, G. E.,Baker, E. N., Lesk, A. M. & Chothia, C. (1993) J.Mol. Biol. 234, 357-372.

45. Davies, D. R. & Cohen, G. H. (1996) Proc. Natl.Acad. Sci. USA 93, 7-12.

46. Stanfield, R. L., Takimoto-Kamimura, M., Rini,J. M., Profy, A. T. & Wilson, I. A. (1993) Struc-ture 1, 83-93.

47. Miller, S., Lesk, A. M., Janin, J. & Chothia, C.(1987) Nature (London) 328, 834-836.

Proc. Natl. Acad. Sci. USA 93 (1996)

Dow

nloa

ded

by g

uest

on

Janu

ary

29, 2

020