the Structures of Proteins

Embed Size (px)

Citation preview

  • 8/13/2019 the Structures of Proteins

    1/10

  • 8/13/2019 the Structures of Proteins

    2/10

    20 Th e structures of proteinsThree-letter abbreviations arecommonly used for protein aminoacids. For 16 of the 20, theabbreviation is simply the firstthree letters of the name:Alanine, Arginine, AspartateCysteine, Glutamate, Glycine,Histidine, Leucine, Lysine,Methionine, Phenylalanine,Proline, Serine, Threonine,mrosine-and Valine.

    The remaining four amino acidsare not so named to avoidambiguities.The abbreviations forthese are: AsnforasparagineandGlnforg lutamine (in each casethere is a related amino acid withthe same first three letters); Il eforisoleucine (which avoids Is0 whichis acommon prefix in chemistry);and Wpfor tryptophan (whichavoidsTry which might easily beconfused withTyr).

    Planar

    _.etrahedral

    Rotation is only possible abouttwo bonds of the main chain,corresponding to relative rotationof adjacent amide units. The tworelevant dihedral angles, q5 and $,are labelled in the diagram.It is useful to make molecularmodels of a simple dipeptide andexamine, for yourself, the effectof changing the two dihedralangles q and 6Viewing from either end of thebond yields the same value forthe dihedral angle-check thisfor yourself. In the diagram here,the dihedral angles are assessedby viewing from the directionnearest the amino end of thechain, i.e. N to C" for $and C"to C for G

    The terminology used to describe this primary structure is illustrated fora representative peptide in Fig. 3.1. The ends of the chain are labelled theamino and carboxy, or N and C, termini which in this case are alanine andglutamine residues, respectively. The order of residues in the chain is thengiven by numbering the monomers from the N to C direction.3.3 Secondary st ructure of proteins3.3.1 Conformational preferences for amino acid residueswithin a protein chainAn appreciation of the three-dimensional structures of proteins can bestbe achieved by considering first the stereochemical preferences of a shortstretch of a polypeptide chain (secondary structure); structures of com-plete proteins (tertiary and quaternary structure) are examined in Sections3 4-3.6.

    As discussed in Chapter 2, the a-carbon of each chiral amino acid hasthe L-configuration and, in a protein, is flanked by two amide linkagesboth fixed in a planar (generally trans) arrangement. This arrangementrepresents the main chain of a protein. Rotation is only possible aroundtwo bonds per residue: the N-C and C -C bonds; these are described bytwo dihedral angles $ and +, respectively (Fig. 3.2). A dihedral angle of 0corresponds to the situation where the backbone substituents are eclipsed(lie over one another). Positive dihedral angles (up to 180 )correspond torotating the bond such that the rear substituents move in a clockwisefashion. Conversely, rotation of the rear substituent counterclockwisecorresponds to a negative dihedral angle.

    The steric effects associated with rotation about these bonds are illu-strated in Figs 3.3 and 3.4 for a typical L-amino acid. The conformation ofthe N-C bond where the carbonyl groups lie over one another ( = O )involves an unfavourable interaction between these groups. Rotation ofthe rear substituents of the N-C bond in an anticlockwise directionrelieves this steric clash. By contrast, an analogous rotation in a clockwisedirection relieves this interaction, but only at the expense of introducing anew unfavourable interaction between the side chain and the carbonyl ofthe preceding residue. Therefore, L-amino acids are expected to preferconformations where $ lies between approximately -60 and -180 .

    Dihedral Cgl e $H - - of ca+150Ca-C bondH

    0The relative orientation of the bondsof the main chain about the N-Cabond is the dihedral angle,$.

    The relative orientation of the bondsof the main chain about the Ca Cbond is the dihedral angle,$.Fig. 3 2 The dihedral angles 4 and 4.

  • 8/13/2019 the Structures of Proteins

    3/10

    Foundations o chemical biology 21Unfavourablestericetween C--,asubstituents

    Clockwise rotationof the rearHnticlockwiserotation of the rearsubstituents by 120 d =o Ns ub st i t ue nt s by 120

    Unfavourablq- - k Rsteric _ _ I&nteractionunless R = HH c=oH=c RHd = -120 = +120

    Fig. 3.3 The effect of varying the dihedral angle $.

    Unfavourable stericinteraction between - IHN substituentsRear substituent rotatedanticlockwise by 60

    Rear substituent rotated&H clockwise by 60HNH R 0= 0 % H N

    = -60t Rotationby further 60'

    Unfavourable-steric interactionunless R = H HN

    otation byfurther60HN

    R :=+'200NH,Unfavourable steric interaction unless R = H

    Fig. 3.4 The effect of varying the dihedral angle $.

    Note that glycine, which lacks achiral centre, is less constrainedthan the other amino acids foundin proteins. By contrast, proline ismore constrained; it is forced,by its ring structure, to adopt aconformation with a 4 value ofca -60".

    @ - -60Ring restrict:,conformation

    Proline

    The dihedral angle preferencesare conveniently represented by aplot of C$ alues against $values,a 'Ramachandran plot', namedafter the scientist who pioneeredthis type of analysis.

    Dihedral angles corresponding ounfavourable steric interactions.

    dFig. 3.5 Stylized Ramachandranplot for an L-amino acid.

    A similar analysis of the effects of rotation about the Ca-C bondindicates that L-amino acids will prefer conformations in which is eitherin the region of - 60 or in the region of +120 to +180 .

    In Fig. 3.5, a and 4 denote regionswith favourable dihedral anglesthat are found in a-he lices and&sheets, respectively.

    3.3.2 Regular structuresGiven that individual amino acid residues in a protein chain have relativelywell-defined conformational preferences, it is reasonable to ask what hap-pens when successive residues adopt the same conformation. Repetition ofa specific pair of dihedral angles always gives rise to some form of a helix.

  • 8/13/2019 the Structures of Proteins

    4/10

    22 The structures o proteinsThe predicted dihedral anglepreferences accord well with theobserved values of these anglesdetermined from the structuresof proteins as shown in Fig. 3.6.

    -1 80 0 I 804Fig. 3 6 Ramachandran diagramin which the experimental 4,angles for a range of residuesother than glycine are shown fora representative set of proteins.

    The presence of an extra alkylgroup rather than a hydrogenatom on the amide nitrogen ofproline not only precludes forma-tion of a stabilizing hydrogenbond with the usual carbonylgroup but also introduces a stericrepulsion with that group. Prolineoften acts to terminate a-helicesand is sometimes called a 'helixbreaker'.

    Typical amino acid RH-bond

    H OProline'-4+No hydrogen onnitrogen. Prolinecannot act as a \H-bond donor. 0

    The right-handedness of thea-helix is a consequence of thechirality of the a-carbon.It is remarkable that for an a-helixnot only is each of the Q,$ anglessuch as to minimize steric strain,but all hydrogen bonds are at anoptimum length and angle, and allmain chain atoms are closepacked to maximize van derWaals forces of attraction.

    Of the various possible helices, two fit well with the simple steric preferencesoutlined in Section 3.3.1. These give rise to the most important regularstructures, the a-helix and the P-sheet which are common components ofthe structures of proteins (the P-sheet consists of individual P-strandsstacked up side by side; each strand is actually a type of helix, see below).

    The a-helix 4 = -60 ; $ = -50 , Fig. 3.7) is a rather compact helixwhere, as the chain turns back on itself, linear hydrogen bonds are formedbetween a carbonyl oxygen and the hydrogen atom of an amide groupfour residues further down the chain. The overall structure of this unit is aright-handed helix with 3.6 amino acid residues per turn. For every turn ofthe helix, the chain extends by 0.54nm (this is termed the pitch of thehelix). The side chains of the amino acid residues involved in an a-helixpoint out from the helix and interact with either solvent or neighbouringportions of the protein. All amino acid residues except proline can bereadily accommodated within an a-helix (although some have a greatertendency than others to adopt this structure).

    The second major class of regular repeated secondary structure found inproteins is the P-sheet, comprised of @-strands 4 = -120 ; = +120 ,Fig. 3.8 . A @-strand is an almost completely extended helix; the mainchain amide groups, therefore, cannot hydrogen bond with neighbouringresidues. They are, however, ideally placed to interact with a neighbouringchain of residues having a similar secondary structure. Two side-by-side

    --__-- - -_In an a-helix hydrogen bonds are formed _ - --._between the main chain C=O of a givenresidue and the N-H of the amino acidfour residues along the chain.Thus residue I s hydrogen bonded toresidue V, residue II to residueVI, etc.

    The a helix structurefound in proteins.

    .. . .I,,

    - - _.

    The pitch ofthe a-helix(0.54 nm)

    Fig. 3 7 a-Helix.

  • 8/13/2019 the Structures of Proteins

    5/10

  • 8/13/2019 the Structures of Proteins

    6/10

    24 he structures of proteins

    The reason that the heptad repeatsequence of a-keratin (Fig. 3.10)adopts an a-helical structure ispartly because the residuesinvolved have a highpropensity for the a-regionof conformational space, andpartly because of the favourableoverall structure which resultsfrom packing the helicestogether.

    a b c d e f g275

    276 282a Glu Ser W A s n Glu Glu

    Ala Tyr m L y s Lys Asna Glu Glu Glu Met Arg Aspin A s n m Ser Thr 8;@ Gin A s n m Ser Thr 8;

    Large non-polar residues are boxed;the numbers refer to the order of theresidues in the linear sequence.

    289

    Fig 3.10 A portion of thesequence of mouse a-keratinillustrating the heptad repeat.

    It is a useful exercise to drawout the precise structure of eachside chain to confirm the polarand non-polar character of theresidues in the above sequence.

    In the a-keratin structure(Fig. 3.11) the helices are orientedat an angle of about 20 to oneanother. This type of coiled-coilstructure had been predicted byFrancis Crick in 1953, before thedetailed structure of any proteinswere known.

    For proteins that reside in an aqueous environment, the burial of non-polar regions is a major driving force favouring the adoption of well-defined overall structures (see Section 1.5). This burial can be achieved byinteraction with another non-polar region of the same protein chain toproduce an ordered tertiary structure. A similar effect can be achieved bythe association of separate protein chains to form a cohesive multimericstructure, known as a quaternary structure. Specific examples of bothtypes of structure are described below.

    The a-keratins, which are key structural proteins in animals, providethe basis for such materials as hair, wool and fingernails. Early X-raydiffraction experiments on proteins included measurements on a-keratinsand revealed that these proteins incorporate some regular structural fea-tures. These were recognized, after a series of model-building experimentsby Linus Pauling, as being due to the presence of a-helices.

    When the order of amino acids in a-keratin was determined, regula-rities were found in the sequence of the main body of the protein. Inalmost all of this region, chemically similar but not identical residuesoccur in a repeating cycle of seven residues. It turns out that thissequence, which can be abbreviated as (a-k-d-e-f-g),, predisposes theprotein to adopt an a-helical structure; large non-polar amino acidsappear at positions a and d and the remaining sites are mostly occupiedby polar residues (Fig. 3.10).

    Because there are approximately 3.6 residues in each turn of an a-helix,the hydrophobic amino acids are oriented as a ribbon along the edge ofthe helix (Fig. 3.11). Whilst the remainder of the helix is well solvated bywater, this ribbon is not, and there is a thermodynamic driving force tobury this region away from solvent. The extended rod-like structure ofa-keratin does not allow this burial to take place within a single proteinchain; instead, two chains associate and form a coiled coil. The resultingrope-like structure equips this protein for its role as a constituent ofbiological fibres.

    This discussion has identified several forces that impact on the overallstructure of proteins: steric interactions; hydrogen bonding in secondarystructural units; and the burial of non-polar residues to avoid unfavour-able solvation effects. In addition, the interactions between side chains intertiary and quaternary structures can be consolidated by covalent cross-linking. As we have seen before, the side chain of cysteine includes a thiolgroup. If two cysteine residues are sufficiently close in space, a covalentdisulphide link can be formed by oxidation. This stabilizes the structure inthe form in which the cross-link is made (see Fig. 2.7; although the twocysteine residues must be close in space for disulphide bond formation tooccur, they need not be near one another in the protein sequence).

    As an example, a-keratin contains some cysteine residues. In hair, theseare generally in the form of disulphde links with neighbouring proteinchains. These hold adjacent fibres in fixed orientations with respect to oneanother. Reduction of these links, followed by reorientation of the fibresand reoxidation, changes the overall shape of the hair. This is exploitedcommercially by hairdressers-it is called a 'permanent wave' or 'perm'.

  • 8/13/2019 the Structures of Proteins

    7/10

    Foundations qf chemical biology 25

    Side-on viewof ana-helix of a-keratin

    Hydrophobicside chainsinterdigitate

    End-on view lookingfrom the carboxylend of the helix

    n water, pairs of helices associateto bury the non-polar residues

    lHelices intertwineto form a rope-likestructure

    Shaded circles represent non-polar amino acid side chains;open circles represent other side chains;a, b, c, d, e, f andg refer to the order of residues in theheptad repeat as shown in Fig. 3.10

    Fig. 3.11 The structure and packing of a-helices of a-keratin.

    3.5 Other structural proteinsThere is a large family of proteins that adopt the coiled-coil type ofstructure found in a-keratin. For example, tropomyosin is an importantfibrous protein in muscle. This protein, which contains nearly 300 aminoacid residues, also displays a characteristic heptad repeat. Although thestructure of this protein is not known in detail, it is closely related(homologous) to that of a-keratin. When proteins have evolved from acommon ancestor, they share similar structural features (residues withsimilar or identical side chains appear in the same order in the relatedproteins), and they are said to be homologous. This will be illustrated inChapter 4. The identification of homologies to proteins of known struc-ture is an important aid in the prediction of the three-dimensional struc-ture of a protein from its amino acid sequence.

    A s was noted in Section 1.4,entropy effects are important indetermining the interaction ofnon-polar residues with water.Burial of non-polar residues,away from water, decreases theunfavourable ordering of the sol-vent (overall decrease in entropy)that would occur if the non-polarresidues were exposed.When proteins adopt compactfolded structures in water, theincreased ordering (unfavourableentropy) of the protein chain ismore than offset by the increaseddisorder of water when thenon-polar side chains are buried.

    Folded, ordered, protein structure(low entropy for protein chain)

    Unfolded, disordered state(high entropy for protein chain)

    Disulphide bonds do notsignificantly affect the degree oforder of the solvent or the foldedstructure, but they restrict thefreedom of the unfolded proteinchain. Thus, entropically, by com-parison with the non-disulphidecase, the folded structure is stabi-lized relative o the unfolded state.

    SI

    7SDisulphide bonds reduce the degreeof disorder in the unfolded state

  • 8/13/2019 the Structures of Proteins

    8/10

    26 The structures o proteinsAlthough it is possible torationalize why a protein adoptsa particular structure, theprediction of the three-dimensional structure of a proteinfrom its amino acid sequence isextremely difficult. This isbecause of the complexity ofproteins and the folding processby which a particular structureemerges. It is, however, a veryimportant area of researchbecause of the vastly increasedknowledge of protein sequencesgenerated by the human genomeproject and related research. Wedo not know the structuresadopted by the vast majority ofthese sequences. ‘Structuralgenomics’ is a term coined todescribe research aimed atgenerating structural informationabout proteins which have beenidentified by genome sequencing.

    The crystalline /3-sheet region ofcocoon silk of the silkwormBombyx mori is responsible for itsmechanical strength. This regioncomprises repeats of thesequence:(Gly-Ala-Gly-Ala-Gly-Ser),.This forms a rigid stackedstructure as shown in Fig. 3.12.

    Other regions of this silk protein(fibroin) are glycine rich andadopt an amorphous structurethat provides the flexibility of thismaterial.

    Collagen is a rigid, triple helix,rich in glycine and proline. Thethree individual chains intertwinein a coiled coil structure, in asomewhat analogous fashion tothat described for the doublehelix of a-keratin in Section 3.4.

    View fromedge of p-sheep-Sheets stack;substituentsinterdigitatellj

    = CH3 or CH,OH, a0 = H , b

    Fig 3.12 The structure of fibroin.

    Stacked antiparallel @-sheetsare found as key structural features offibroin proteins, which are found in silk fibres, and of /3-keratins, whichare components of bird feathers. In a @-sheet tructure, the side chains ofamino acids point alternately above and below the plane of the sheet. It isnot surprising, therefore, that these proteins which form stable repeating@-sheetstructures show a repeated diad pattern (a-b), (Fig. 3.12), corre-sponding to the alternation of two types of amino acid residue, over muchof their length.As discussed above, proline cannot be accommodated within a regular

    a-helix, but proline-rich proteins can form an alternative (extended) helix.Interwound helices of this type are found in collagen, a high tensilestrength material which is the principal constituent of connective tissue inanimals, including tendons, cartilage and blood vessels.

    In conclusion, three basic structures are found in fibrous proteins: thea-helix, the P-sheet and the collagen triple helix. Diverse materials aremade by adapting and varying these basic forms. Flexible and elasticmaterials are often made from a-helical structures with hydrogen bondingwithin, rather than between, chains. Materials based on @-sheetscan bestrong and flexible; these structures are exploited particularly by insects asfibres, resistant to stretching. Collagen, a triple-helical structure, is usedby animals to make strong, rigid materials, capable of efficiently trans-mitting mechanical force.3 6 Globular proteinsMost proteins adopt ‘globular’ structures in which a-helix and/or@-strandunits are linked by turns, allowing the protein chain to fold into a

  • 8/13/2019 the Structures of Proteins

    9/10

    Foundations o chemical biology 21

    0

    .$“”‘.. .=(85Glu 165\

    /IiTyr 164b0. k-N-N

    Polar face of the helix.-, -points to solution-- - Y End-on view ofA q elix,a (the. third a-helix

    points to protein core

    ,127

    0 wIAia(163 Cys -126\ Proline residues are foundat turns in the structure,e.g. Pro-57 s found in theturn betweena and 83

    al-162 >eu-125 /Leu 93H-N ,+ H-”>= oH-N2=

    81-161 11e124 I eg2ko’-H . \ \/Val 91

    /V8l-160 /Va1-123

    Representative se ct m of P-shp4,85,86(the fourth, fifth and sixthstrands from the N-terminus). H--NNote the wevalence of non-polar ‘iresidues (labelled n boldfa ).

    Fig. 3 13 The tertiary structure of triose phosphate isomerase.

    compact overall structure. In these structures, large non-polar residues areburied in the core of the protein away from solvent water, whereas polarresidues are predominantly on the surface of the protein. This type ofarrangement is illustrated in Fig. 3.13 for triose phosphate isomerase,TIM, an enzyme that is discussed in detail in Chapter 5.

    The three-dimensional structure of TIM has been determined in detailby X-ray crystallography. The enzyme is a symmetrical dimer. The ter-tiary structure of each TIM monomer (see Fig. 3.13 comprises alternate&strands and a-helices linked by turns. The overall shape is like a barrel.The eight @-strands at the centre are arranged in a parallel P-sheet,resembling a cylinder. Since this forms the core of the protein, all theamino acid residues in the middle of the 0-sheet are hydrophobic. Sur-rounding the ,&sheet are a series of amphiphilic helices, which have a non-polar face pointing towards the protein core and a polar face pointing intosolution. The overall structure has a close-packed non-polar core and apredominantly polar surface that interacts with solution. Residual non-polar regions of the surface of individual TIM molecules are buried whenthey associate to form a dimer, the native quaternary structure of theenzyme. As will be discussed in Chapter 5, the interaction of a particularpart of the surface of each protein chain, the ‘active site’, with substratemolecules is responsible for the catalytic activity of this enzyme.

    The globin family of proteinsprovides another example oftertiary and quaternary globularprotein structures. The oxygenstorage protein myoglobinconsists of a series of a-helicesfolded into a compact tertiarystructure. In the oxygen transportprotein haemoglobin four proteinchains, similar to myoglobin,associate to form a well-definedquaternary structure. This isdiscussed in detail in Section 4.4.In Fig. 3.13, amino acid residuesare numbered according to theirorder in the linear sequence ofTIM. In addition, the polarity of theindividual residues is indicated bythe way the residues are labelled.Large, non-polar, hydrophobicresidues are labelled in boldfaceand charged highly polar resi-dues are labelled in italics.The residues n the middle of the pstrands are non-polar.There areoccasional residues in the @-sheewhich are polar, but these are at theends of strands and point into solu-tion, rather than into the core of theprotein. As an example, Glu-165,bearing a carboxylate side chain,is at the end of a @-strand.As dis-cussed in Chapter5, this residue isat the active site of the enzyme; itis on the surface of the protein andinteracts with the enzyme sub-strate. Another key residue at theactive site is His-95; this is tworesidues beyond the end of one ofthe other &strands shown. Thisillustrates how the folded nature ofglobular proteins brings residues,which are far apart in the primarysequence, close together in spacein the final structure.‘Amphiphilic’ is derived fromGreek, meaning ‘lover of both’. Anamphiphilic helix is one that likesboth polar and non-polar envir-onments, by virtue of havingpolar and non-polar faces.The close-packed nature of glob-ular proteins maximizes van derWaals interactions that help tostabilize the structure and providethe rigidity essential for theirfunction.

  • 8/13/2019 the Structures of Proteins

    10/10

    28 The structures of proteinsIn order to achieve itsfunctional state after synthesis onthe ribosome (see Section 9.11), aprotein must fold to the specificstructure unique to i ts particularsequence. The manner in whichthis occurs is the subject ofintense investigation byexperimentalists andtheoreticians. It is a key linkbetween sequence andstructure, and hence to ourability to predict structures,understand protein functions,and design new proteins forspecific tasks.

    In conditions such as Alzheimer’sdisease and ‘mad cow disease’,proteins that are normallyglobular and soluble misfold andadopt a largely /3-sheet structure(related to the fibroin structuredescribed in Section 3.6). In thisform the proteins aggregate toform an extended /3-sheetstructure. The resulting insolublematerial that is deposited in thediseased tissue is thought to be atleast partly responsible for thecharacteristic degeneration of thebrain associated with thesediseases. It is believed thatincorrectly folded proteins canpromote the misfolding of othermolecules, and hence that suchproteins,‘prions’, can cause somediseases of this type to betransmissible or ‘infectious’.Other diseases associated withprotein misfolding include cysticfibrosis and one type of diabetes.

    3.7 SummarySteric interactions limit the conformational opportunities open to mostamino acid residues in a protein chain. Two regular structural units, thea-helix and P-sheet, which conform with these requirements, and whichexploit favourable hydrogen bonding arrangements, are common inproteins. These secondary structural units, when extended along asequence, can form fibres. In most proteins, they are connected by loopswhich allow the protein chain to fold into a compact ‘globular’ structure.Individual protein chains often fold into regular structures, and some-times aggregate, in order to bury non-polar regions of the protein thatwould otherwise be exposed to solvent. The resulting well-defined struc-tures can be consolidated by covalent cross-linking, e.g. by disulphideformation. We now know a great deal about the structures of some pro-teins and this knowledge provides a solid basis for understanding theirproperties; however, whilst it is possible to rationalize why a particularstructure is preferred, it is difficult to predict the three-dimensionalstructure of a protein from its amino acid sequence. The generation ofthree-dimensional structural information for proteins whose sequencesare known continues to be a major research challenge, particularly inattempts to define structures and functions for the large number of pro-teins identified in the sequencing of the human genome. It forms the coreof a contemporary field of research known as structural genomics.

    Further readingC. Branden and J. Tooze (1999) Introduction to Protein Structure 2nd edn,Garland Publishing Inc., New York and London, is an excellent introductory

    textbook in this area.C. Cohen and D.A D. Parry (1986) Trends in Biocl~emical ciences 11, pp. 245-8,gives a good general overview of the structure of proteins related to a-keratin.M. Perutz (1992) Protein Structure W . H. Freeman and Co. , New York, providesan overview of protein structure and medicinal applications.A. R. Fersht (1999) Structure and Meckaizisni in Protein Science W. H. Freemanand Co. , New York, is an outstanding text that discusses many aspects ofprotein structure and folding.C. M . Dobson (1999) Trends in Biochemical Sciences 24 pp. 329-32, gives a briefaccount of diseases related to protein misfolding.R. H. Pain (ed.) (2000) Fundamentals of Protein Folding Oxford University Press,Oxford, provides a comprehensive account of protein folding and the diseases

    associated with misfolding.