30
PROTEINS STRUCTURE AND FUNCTION David Whitford John Wiley & Sons, Ltd

PROTEINS · Secondary structure 39 Tertiary structure 50 Quaternary structure 62 The globin family and the role of quaternary structure in modulating activity 66 Immunoglobulins 74

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • PROTEINSSTRUCTURE AND FUNCTION

    David Whitford

    John Wiley & Sons, Ltd

    Innodata0470012412.jpg

  • PROTEINSSTRUCTURE AND FUNCTION

  • PROTEINSSTRUCTURE AND FUNCTION

    David Whitford

    John Wiley & Sons, Ltd

  • Copyright 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

    Telephone (+44) 1243 779777

    Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com

    All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or byany means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright,Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham CourtRoad, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressedto the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ,England, or emailed to [email protected], or faxed to (+44) 1243 770620.

    This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold onthe understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expertassistance is required, the services of a competent professional should be sought.

    Other Wiley Editorial Offices

    John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

    Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

    Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

    John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

    John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

    John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

    Wiley also publishes its books in a variety of electronic formats. Some content that appearsin print may not be available in electronic books.

    British Library Cataloguing in Publication Data

    A catalogue record for this book is available from the British Library

    ISBN 0-471-49893-9 HBISBN 0-471-49894-7 PB

    Typeset in 10/12pt Times by Laserwords Private Limited, Chennai, IndiaPrinted and bound by Graphos SpA, Barcelona, SpainThis book is printed on acid-free paper responsibly manufactured from sustainable forestryin which at least two trees are planted for each one used for paper production.

    http://www.wiley.com

  • For my parents,

    Elizabeth and Percy Whitford,

    to whom I owe everything

  • Contents

    Preface xi

    1 An Introduction to protein structure and function 1A brief and very selective historical perspective 1The biological diversity of proteins 5Proteins and the sequencing of the human and other genomes 9Why study proteins? 9

    2 Amino acids: the building blocks of proteins 13The 20 amino acids found in proteins 13The acid–base properties of amino acids 14Stereochemical representations of amino acids 15Peptide bonds 16The chemical and physical properties of amino acids 23Detection, identification and quantification of amino acids and proteins 32Stereoisomerism 34Non-standard amino acids 35Summary 36Problems 37

    3 The three-dimensional structure of proteins 39Primary structure or sequence 39Secondary structure 39Tertiary structure 50Quaternary structure 62The globin family and the role of quaternary structure in modulating activity 66Immunoglobulins 74Cyclic proteins 81Summary 81Problems 83

    4 The structure and function of fibrous proteins 85The amino acid composition and organization of fibrous proteins 85Keratins 86Fibroin 92Collagen 92Summary 102Problems 103

  • viii CONTENTS

    5 The structure and function of membrane proteins 105The molecular organization of membranes 105Membrane protein topology and function seen through organization of the

    erythrocyte membrane 110Bacteriorhodopsin and the discovery of seven transmembrane helices 114The structure of the bacterial reaction centre 123Oxygenic photosynthesis 126Photosystem I 126Membrane proteins based on transmembrane β barrels 128Respiratory complexes 132Complex III, the ubiquinol-cytochrome c oxidoreductase 132Complex IV or cytochrome oxidase 138The structure of ATP synthetase 144ATPase family 152Summary 156Problems 159

    6 The diversity of proteins 161Prebiotic synthesis and the origins of proteins 161Evolutionary divergence of organisms and its relationship to protein

    structure and function 163Protein sequence analysis 165Protein databases 180Gene fusion and duplication 181Secondary structure prediction 181Genomics and proteomics 183Summary 187Problems 187

    7 Enzyme kinetics, structure, function, and catalysis 189Enzyme nomenclature 191Enzyme co-factors 192Chemical kinetics 192The transition state and the action of enzymes 195The kinetics of enzyme action 197Catalytic mechanisms 202Enzyme structure 209Lysozyme 209The serine proteases 212Triose phosphate isomerase 215Tyrosyl tRNA synthetase 218EcoRI restriction endonuclease 221Enzyme inhibition and regulation 224Irreversible inhibition of enzyme activity 227Allosteric regulation 231Covalent modification 237Isoenzymes or isozymes 241Summary 242Problems 244

  • CONTENTS ix

    8 Protein synthesis, processing and turnover 247Cell cycle 247The structure of Cdk and its role in the cell cycle 250Cdk–cyclin complex regulation 252DNA replication 253Transcription 254Eukaryotic transcription factors: variation on a ‘basic’ theme 261The spliceosome and its role in transcription 265Translation 266Transfer RNA (tRNA) 267The composition of prokaryotic and eukaryotic ribosomes 269A structural basis for protein synthesis 272An outline of protein synthesis 273Antibiotics provide insight into protein synthesis 278Affinity labelling and RNA ‘footprinting’ 279Structural studies of the ribosome 279Post-translational modification of proteins 287Protein sorting or targeting 293The nuclear pore assembly 302Protein turnover 303Apoptosis 310Summary 310Problems 312

    9 Protein expression, purification and characterization 313The isolation and characterization of proteins 313Recombinant DNA technology and protein expression 313Purification of proteins 318Centrifugation 320Solubility and ‘salting out’ and ‘salting in’ 323Chromatography 326Dialysis and ultrafiltration 333Polyacrylamide gel electrophoresis 333Mass spectrometry 340How to purify a protein? 342Summary 344Problems 345

    10 Physical methods of determining the three-dimensional structure ofproteins 347Introduction 347The use of electromagnetic radiation 348X-ray crystallography 349Nuclear magnetic resonance spectroscopy 360Cryoelectron microscopy 375Neutron diffraction 379Optical spectroscopic techniques 379Vibrational spectroscopy 387Raman spectroscopy 389

  • x CONTENTS

    ESR and ENDOR 390Summary 392Problems 393

    11 Protein folding in vivo and in vitro 395Introduction 395Factors determining the protein fold 395Factors governing protein stability 403Folding problem and Levinthal’s paradox 403Models of protein folding 408Amide exchange and measurement of protein folding 411Kinetic barriers to refolding 412In vivo protein folding 415Membrane protein folding 422Protein misfolding and the disease state 426Summary 435Problems 437

    12 Protein structure and a molecular approach to medicine 439Introduction 439Sickle cell anaemia 441Viruses and their impact on health as seen through structure and function 442HIV and AIDS 443The influenza virus 457p53 and its role in cancer 470Emphysema and α1-antitrypsin 475Summary 478Problems 479

    Epilogue 481

    Glossary 483

    Appendices 491

    Bibliography 495

    References 499

    Index 511

  • Preface

    When I first started studying proteins as an undergrad-uate I encountered for the first time complex areas ofbiochemistry arising from the pioneering work of Paul-ing, Sumner, Kendrew, Perutz, Anfinsen, together withother scientific ‘giants’ too numerous to describe atlength in this text. The area seemed complete. Howwrong I was and how wrong an undergraduate’s per-ception can be! The last 30 years have seen an explo-sion in the area of protein biochemistry so that my 1975edition of Biochemistry by Albert Lehninger remains,perhaps, of historical interest only. The greatest changehas occurred through the development of molecularbiology where fragments of DNA are manipulated inways previously unimagined. This has enabled DNAto be sequenced, cloned, manipulated and expressedin many different cells. As a result areas of recom-binant DNA technology and protein engineering haveevolved rapidly to become specialist disciplines in theirown right. Almost any protein whose primary sequenceis known can be produced in large quantity via theexpression of cloned or synthetic genes in recombinanthost cells. Not only is the method allowing scien-tists to study some proteins for the first time but theincreased amount of protein derived from recombinantDNA technology is also allowing the application ofnew and continually advancing structural techniques.In this area X-ray crystallography has remained at theforefront for over 40 years as a method of determin-ing protein structure but it is now joined by nuclearmagnetic resonance (NMR) spectroscopy and morerecently by cryoelectron microscopy whilst other meth-ods such as circular dichroism, infrared and Ramanspectroscopy, electron spin resonance spectroscopy,mass spectrometry and fluorescence provide more lim-ited, yet often vital and complementary, structural data.In many instances these methods have become estab-lished techniques only in the last 20 years and are

    consequently absent in many of those familiar text-books occupying the shelves of university libraries.

    An even greater impact on biochemistry hasoccurred with the rapid development of cost-effective,powerful, desktop computers with performance equiv-alent to the previous generation of supercomput-ers. Many experimental techniques relied on the co-development of computer hardware but software hasalso played a vital role in protein biochemistry. Wecan now search databases comparing proteins at thelevel of DNA or amino acid sequences, building uppatterns of homology and relationships that provideinsight into origin and possible function. In additionwe use computers routinely to calculate properties suchas isoelectric point, number of hydrophobic residues orsecondary structure – something that would have beenextraordinarily tedious, time consuming and problem-atic 20 years ago. Computers have revolutionized allaspects of protein biochemistry and there is little doubtthat their influence will continue to increase in theforthcoming decades. The new area of bioinformaticsreflects these advances in computing.

    In my attempt to construct an introductory yet exten-sive text on proteins I have, of necessity, been circum-spect in my description of the subject area. I have oftenrelied on qualitative rather than quantitative descrip-tions and I have attempted to minimise the introduc-tion of unwieldy equations or formulae. This doesnot reflect my own interests in physical biochemistrybecause my research, I hope, was often quantitative.In some cases particularly the chapters on enzymesand physical methods the introduction of equations isunavoidable but also necessary to an initial descrip-tion of the content of these chapters. I would be failingin my duty as an educator if I omitted some of theseequations and I hope students will keep going at these‘difficult’ points or failing that just omit them entirely

  • xii PREFACE

    on first reading this book. However, in general I wishto introduce students to proteins by describing princi-ples governing their structure and function and to avoidover-complication in this presentation through rigorousand quantitative treatment. This book is firmly intendedto be a broad introductory text suitable for undergrad-uate and postgraduate study, perhaps after an initialexposure to the subject of protein biochemistry, whilstat the same time introducing specialist areas prior tofuture advanced study. I hope the following chapterswill help to direct students to the amazing beauty andcomplexity of protein systems.

    Target audienceThe present text should be suitable for all introductorymodes of biochemistry, molecular biology, chemistry,medicine and dentistry. In the UK this generally meansthe book is suitable for all undergraduates betweenyears 1 and 3 and this book has stemmed from lecturesgiven as parts of biochemistry courses to students ofbiochemistry, chemistry, medicine and dentistry in all3 years. Where possible each chapter is structuredto increase progressively in complexity. For purelyintroductory courses as would occur in years 1 or 2it is sufficient to read only the first parts, or selectedsections, of each chapter. More advanced courses mayrequire thorough reading of each chapter together withconsultation of the bibliography and secondly the listof references given at the end of the book.

    The world wide webIn the last ten years the world wide web (WWW)has transformed information available to students. Itprovides a new and useful medium with which todeliver lecture notes and an exciting and new teach-ing resource for all. Consequently within this bookURLs direct students to learning resources and a listof important addresses is included in the appendix.In an effort to exploit the power of the internetthis book is associated with ‘web-based’ tutorials,problems and content and is accessed from the followingURL http://www.wiley.com/go/whitfordproteins. These‘pages’ are continually updated and point the interestedreader towards new areas as they emerge. The Bibli-ography points interested readers towards further study

    material suitable for a first introduction to a subjectwhilst the list of references provides original sourcesfor many areas covered in each of the twelve chapters.

    For the problems included at the end of each chapterthere are approximately 10 questions that aim to buildon the subject matter discussed in the preceding text.Often the questions will increase in difficulty althoughthis is not always the case. In this book I have limitedthe bibliography to broad reviews or accessible journalpapers and I have deliberately restricted the number of‘high-powered’ (difficult!) articles since I believe thisorganization is of greater use to students studying thesesubjects for the first time. To aid the learning processthe web edition has multiple-choice questions for useas a formative assessment exercise. I should certainlylike to hear of all mistakes or omissions encounteredin this text and my hope is that educators and studentswill let me know via the e-mail address at the end ofthis section of any required corrections or additions.

    Proteins are three-dimensional (3D) objects that areinadequately represented on book pages. Consequentlymany proteins are best viewed as molecular imagesusing freely available software. Here, real-time manip-ulation of coordinate files is possible and will provehelpful to understanding aspects of structure and func-tion. The importance of viewing, manipulating andeven changing the representation of proteins to com-prehending structure and function cannot be underesti-mated. Experience has suggested that the use of com-puters in this area can have a dramatic effect on stu-dent’s understanding of protein structures. The abilityto visualize in 3D conveys so much information – farmore than any simple 2D picture in this book couldever hope to portray. Alongside many figures I havewritten the Protein DataBank files (e.g. PDB: 1HKO)used to produce diagrams. These files can be obtainedfrom databases at several permanent sites based aroundthe world such as http://www.rscb.org/pdb or one ofthe many ‘mirrors’ that exist (for example, in theUK this data is found at http://pdb.ccdc.cam.ac.uk).For students with Internet access each PDB file canbe retrieved and manipulated independently to pro-duce comparable images to those shown in the text.To explore these macromolecular images with reason-able efficiency does not require the latest ‘all-powerful’desktop computer. A computer with a Pentium III (orlater) based processor, a clock speed of 200 MHz or

  • PREFACE xiii

    greater, 32–64 MB RAM, hard disks of 10 GB, agraphics video card with at least 8 MB memory anda connection to the internet are sufficient to view andstore a significant number of files together with rep-resentative images. Of course things are easier with acomputer with a surfeit of memory (>256 MB) anda high ‘clock’ speed (>2 GHz) but it is not obliga-tory to see ‘on-line’ content or to manipulate molecularimages. This book was started on a 700 MHz PentiumIII based processor equipped with 256 MB RAM and16 MB graphics card.

    Organization of this bookThis book will address the structure and function ofproteins in 12 subsequent chapters each with a defini-tive theme. After an initial chapter describing why onewould wish to study proteins and a brief historicalbackground the second chapter deals with the ‘buildingblocks’ of proteins, namely the amino acids togetherwith their respective chemical and physical proper-ties. No attempt is made at any point to describe themetabolism connected with these amino acids and thereader should consult general textbooks for descriptionsof the synthesis and degradation of amino acids. Thisis a major area in its own right and would have length-ened the present book too much. However, I wouldlike to think that students will not avoid these areasbecause they remain an equally important subject thatshould be covered at some point within the under-graduate curriculum. Chapter 3 covers the assemblyof amino acids into polypeptide chains and levels oforganizational structure found within proteins. Almostall detailed knowledge of protein structure and func-tion has arisen through studies of globular proteins butthe presence of fibrous proteins with different struc-tures and functional properties necessitated a separatechapter devoted to this area (Chapter 4). Within thisclass the best understood structures are those belongingto the collagen class of proteins, the keratins and theextended β sheet structures such as silk fibroin. Thedivision between globular proteins and fibrous proteinswas made at a time when the only properties one couldcompare readily were a protein’s amino acid compo-sition and hydrodynamic radius. It is now apparentthat other proteins exist with properties intermediatebetween globular and fibrous proteins that do not lend

    themselves to simple classification. However, the ‘old’schemes of identification retain their value and serveto emphasize differences in proteins.

    Membrane proteins represent a third group withdifferent composition and properties. Most of theseproteins are poorly understood, but there have beenspectacular successes from the initial low-resolutionstructure of bacteriorhodopsin to the highly definedstructure of bacterial photosynthetic reaction centres.These advances paved the way towards structuralstudies of G proteins and G-protein coupled receptors,the respiratory complexes from aerobic bacteria and thestructure of ATP synthetases.

    Chapter 6 focuses both on experimental and com-putational methods of comparing proteins where insilico methods have become increasingly important asa vital tool to assist with modern protein biochemistry.Chapter 7 focuses on enzymes and by discussing basicreaction rate theories and kinetics the chapter leads toa discussion of enzyme-catalysed reactions. Enzymescatalyse reactions through a variety of mechanismsincluding acid–base catalysis, nucleophilic drivenchemistry and transition state stabilization. These andother mechanisms are described along with the princi-ples of regulation, active site chemistry and binding.

    The involvement of proteins in the cell cycle,transcription, translation, sorting and degradation ofproteins is described in Chapter 8. In 50 years wehave progressed from elucidating the structure ofDNA to uncovering how this information is convertedinto proteins. The chapter is based around the struc-ture of two macromolecular systems: the ribosomedevoted towards accurate and efficient synthesis andthe proteasome designed to catalyse specific proteoly-sis. Chapter 9 deals with the methods of protein purifi-cation. Very often, biochemistry textbooks describetechniques without placing the technique in the correctcontext. As a result, in Chapter 9 I have attempted todescribe equipment as well as techniques so that stu-dents may obtain a proper impression of this area.

    Structural methods determine the topology or foldof proteins. With an elucidation of structure at atomiclevels of resolution comes an understanding of bio-logical function. Chapter 10 addresses this area bydescribing different techniques. X-ray crystallographyremains at the forefront of research with new variationsof the basic principle allowing faster determination of

  • xiv PREFACE

    structure at improved resolution. NMR methods yieldstructures of comparable resolution to crystallographyfor small soluble proteins. In ideal situations thesemethods provide complete structural determination ofall heavy atoms but they are complemented by otherspectroscopic methods such as absorbance and fluores-cence methods, mass spectrometry and infrared spec-troscopy. These techniques provide important ancillaryinformation on tertiary structure such as the helical con-tent of the protein, the proportion and environment ofaromatic residues within a protein as well as secondarystructure content.

    Chapter 11 describes protein folding and stabil-ity – a subject that has generated intense research inter-est with the recognition that disease states arise fromaberrant folding or stability. The mechanism of proteinfolding is illustrated by in vitro and in vivo studies.Whilst the broad concepts underlying protein fold-ing were deduced from studies of ‘model’ proteinssuch as ribonuclease, analysis of cell folding path-ways has highlighted specialised proteins, chaperones,with a critical function to the overall process. TheGroES–GroEL complex is discussed to highlight theintegrated process of synthesis and folding in vivo.

    The final chapter builds on the preceding 11 chaptersusing a restricted set of well-studied proteins (casestudies) with significant impact on molecular medicine.These proteins include haemoglobin, viral proteins,p53, prions and α1-antitrypsin. Although still a youngsubject area this branch of protein science will expandin the next few years and will rely on the techniques,knowledge and principles elucidated in Chapters 1–11.The examples emphasize the impact of protein scienceand molecular medicine on the quality of human life.

    AcknowledgementsI am indebted to all research students and post-docswho shared my laboratories at the Universities of Lon-don and Oxford during the last 15 years in many casesacting as ‘test subjects’ for teaching ideas. I shouldlike to thank Drs Roger Hewson, Richard Newboldand Susan Manyusa whose comments throughout myresearch and teaching career were always valued. Iwould also like to thank individuals, too numerousto name, with whom I interacted at King’s CollegeLondon, Imperial College of Science, Technology and

    Medicine and the University of Oxford. In this con-text I should like to thank Dr John Russell, formerlyof Imperial College London whose goodwill, humourand fantastic insight into the history of science, thescientific method and ‘day to day’ experimentation pre-vented absolute despair.

    During preparation of this book many individu-als read and contributed valuable comments to themanuscript’s content, phrasing and ideas. In particular Iwish to thank these unnamed and some times unknownindividuals who read one or more of the chapters of thisbook. As is often said by most authors at this pointdespite their valuable contributions all of the remain-ing errors and deficiencies in the current text are myresponsibility. In this context I could easily have spentmore months attempting to perfect the current text.I am very aware that this text has deficiencies but Ihope these defects will not detract from its value. Inaddition my wish to try other avenues, other roads nottaken, dictates that this manuscript is completed with-out delay.

    Writing and producing a textbook would not bepossible without the support of a good publisher. Ishould like to thank all the staff at John Wiley & Sons,Chichester, UK. This exhaustive list includes particu-larly Andrew Slade as senior Publishing Editor whohelped smooth the bumpy route towards production ofthis book, Lisa Tickner who first initiated events lead-ing to commissioning this book, Rachel Ballard whosupervised day to day business on this book, replacingevery form I lost without complaint and monitoringtactfully and gently about possible completion dates,Robert Hambrook who translated my text and diagramsinto a beautiful book, and the remainder of the pro-duction team of John Wiley and Sons. Together weinched our way towards the painfully slow productionof this text, although the pace was entirely attributableto the author.

    Lastly I must also thank Susan who tolerated theprotracted completion of this book, reading chaptersand offering support for this project throughout whilstcoping with the arrival of Alexandra and Ethaneffortlessly (unlike their father).

    David WhitfordApril 2004

    [email protected]

  • 1An Introduction to protein structure

    and function

    Biochemistry has exploded as a major scientificendeavour over the last one hundred years to rival pre-viously established disciplines such as chemistry andphysics. This occurred with the recognition that livingsystems are based on the familiar elements of organicchemistry (carbon, oxygen, nitrogen and hydrogen)together with the occasional involvement of inorganicchemistry and elements such as iron, copper, sodium,potassium and magnesium. More importantly the lawsof physics including those concerning thermodynam-ics, electricity and quantum physics are applicable tobiochemical systems and no ‘vital’ force distinguishesliving from non-living systems. As a result the lawsof chemistry and physics are successfully applied tobiochemistry and ideas from physics and chemistryhave found widespread application, frequently revolu-tionizing our understanding of complex systems suchas cells.

    This book focuses on one major component of allliving systems – the proteins. Proteins are found inall living systems ranging from bacteria and virusesthrough the unicellular and simple eukaryotes tovertebrates and higher mammals such as humans.Proteins make up over 50 percent of the dry weightof cells and are present in greater amounts thanany other biomolecule. Proteins are unique amongstthe macromolecules in underpinning every reaction

    occurring in biological systems. It goes without sayingthat one should not ignore the other components ofliving systems since they have indispensable roles, butin this text we will consider only proteins.

    A brief and very selective historicalperspective

    With the vast accumulation of knowledge about pro-teins over the last 50 years it is perhaps surprising todiscover that the term protein was introduced nearly170 years ago. One early description was by GerhardusJohannes Mulder in 1839 where his studies on the com-position of animal substances, chiefly fibrin, albuminand gelatin, showed the presence of carbon, hydro-gen, oxygen and nitrogen. In addition he recognizedthat sulfur and phosphorus were present sometimes in‘animal substances’ that contained large numbers ofatoms. In other words, he established that these ‘sub-stances’ were macromolecules. Mulder communicatedhis results to Jöns Jakob Berzelius and it is suggestedthe term protein arose from this interaction where theorigin of the word protein has been variously ascribedto derivation from the Latin word primarius or fromthe Greek god Proteus. The definition of proteins wastimely since in 1828 Friedrich Wöhler had shown that

    Proteins: Structure and Function by David Whitford 2005 John Wiley & Sons, Ltd

  • 2 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

    (NH4)OCN C

    O

    H2N NH2

    Figure 1.1 The decomposition of ammonium cyanateyields urea

    heating ammonium cyanate resulted in isomerism andthe formation of urea (Figure 1.1). Organic compoundscharacteristic of living systems, such as urea, couldbe derived from simple inorganic chemicals. For manyhistorians this marks the beginning of biochemistry andit is appropriate that the discovery of proteins occurredat the same period.

    The development of biochemistry and the study ofproteins was assisted by analysis of their compositionand structure by Heinrich Hlasiwetz and Josef Haber-mann around 1873 and the recognition that proteinswere made up of smaller units called amino acids.They established that hydrolysis of casein with strongacids or alkali yielded glutamic acid, aspartic acid,leucine, tyrosine and ammonia whilst the hydrolysisof other proteins yielded a different group of products.Importantly their work suggested that the properties ofproteins depended uniquely on the constituent parts – atheme that is equally relevant today in modern bio-chemical study.

    Another landmark in the study of proteins occurredin 1902 with Franz Hofmeister establishing the con-stituent atoms of the peptide bond with the polypep-tide backbone derived from the condensation of freeamino acids. Five years earlier Eduard Buchner rev-olutionized views of protein function by demonstrat-ing that yeast cell extracts catalysed fermentation ofsugar into ethanol and carbon dioxide. Previously itwas believed that only living systems performed thiscatalytic function. Emil Fischer further studied biolog-ical catalysis and proposed that components of yeast,which he called enzymes, combined with sugar to pro-duce an intermediate compound. With the realizationthat cells were full of enzymes 100 years of researchhas developed and refined these discoveries. Furtherlandmarks in the study of proteins could include Sum-ner’s crystallization of the first enzyme (urease) in1926 and Pauling’s description of the geometry of the

    peptide bond; however, extensive discussion of theseadvances and many other important discoveries in pro-tein biochemistry are best left to history of sciencetextbooks.

    A brief look at the award of the Nobel Prizesfor Chemistry, Physiology and Medicine since 1900highlighted in Table 1.1 reveals the involvement ofmany diverse areas of science in protein biochemistry.At first glance it is not obvious why William andLawrence Bragg’s discovery of the diffraction ofX-rays by sodium chloride crystals is relevant, butdiffraction by protein crystals is the main route towardsbiological structure determination. Their discovery wasthe first step in the development of this technique.Discoveries in chemistry and physics have beenimplemented rapidly in the study of proteins. By 1958Max Perutz and John Kendrew had determined the firstprotein structure and this was soon followed by thelarger, multiple subunit, structure of haemoglobin andthe first enzyme, lysozyme. This remarkable advancein knowledge extended from initial understanding ofthe atomic composition of proteins around 1900 tothe determination of the three-dimensional structure ofproteins in the 1960s and represents a major chapterof modern biochemistry. However, advances havecontinued with new areas of molecular biology provingequally important to understanding protein structureand function.

    Life may be defined as the ordered interactionof proteins and all forms of life from viruses tocomplex, specialized, mammalian cells are based onproteins made up of the same building blocks oramino acids. Proteins found in simple unicellularorganisms such as bacteria are identical in structureand function to those found in human cells illustratingthe evolutionary lineage from simple to complexorganisms.

    Molecular biology starts with the dramatic eluci-dation of the structure of the DNA double helix byJames Watson, Francis Crick, Rosalind Franklin andMaurice Wilkins in 1953. Today, details of DNA repli-cation, transcription into RNA and the synthesis of pro-teins (translation) are extensive. This has establishedan enormous body of knowledge representing a wholenew subject area. All cells encode the information con-tent of proteins within genes, or more accurately theorder of bases along the DNA strand, yet it is the

  • A BRIEF AND VERY SELECTIVE HISTORICAL PERSPECTIVE 3

    Table 1.1 Selected landmarks in the study of protein structure and function from 1900–2002 as seen by the awardof the Nobel Prize for Chemistry, Physiology or Medicine

    Date Discoverer + Discovery1901 Wilhelm Conrad Röntgen ‘in recognition of the . . . discovery of the remarkable rays subsequently

    named after him’1907 Eduard Buchner ‘cell-free fermentation’

    1914 Max von Laue ‘for his discovery of the diffraction of X-rays by crystals’

    1915 William Henry Bragg and William Lawrence Bragg ‘for their services in the analysis of crystalstructure by . . . X-rays’

    1923 Frederick Grant Banting and John James Richard Macleod ‘for the discovery of insulin’

    1930 Karl Landsteiner ‘for his discovery of human blood groups’

    1946 James Batcheller Sumner ‘for his discovery that enzymes can be crystallized’.

    John Howard Northrop and Wendell Meredith Stanley ‘for their preparation of enzymes and virusproteins in a pure form’

    1948 Arne Wilhelm Kaurin Tiselius ‘for his research on electrophoresis and adsorption analysis, especiallyfor his discoveries concerning the complex nature of the serum proteins’

    1952 Archer John Porter Martin and Richard Laurence Millington Synge ‘for their invention of partitionchromatography’

    1952 Felix Bloch and Edward Mills Purcell ‘for their development of new methods for nuclear magneticprecision measurements and discoveries in connection therewith’

    1954 Linus Carl Pauling ‘for his research into the nature of the chemical bond and . . . to the elucidation of. . . complex substances’

    1958 Frederick Sanger ‘for his work on the structure of proteins, especially that of insulin’

    1959 Severo Ochoa and Arthur Kornberg ‘for their discovery of the mechanisms in the biological synthesisof ribonucleic acid and deoxyribonucleic acid’

    1962 Max Ferdinand Perutz and John Cowdery Kendrew ‘for their studies of the structures of globularproteins’

    1962 Francis Harry Compton Crick, James Dewey Watson and Maurice Hugh Frederick Wilkins ‘for theirdiscoveries concerning the molecular structure of nucleic acids and its significance for informationtransfer in living material’

    1964 Dorothy Crowfoot Hodgkin ‘for her determinations by X-ray techniques of the structures of importantbiochemical substances’

    1965 François Jacob, André Lwoff and Jacques Monod ‘for discoveries concerning genetic control ofenzyme and virus synthesis’

    1968 Robert W. Holley, Har Gobind Khorana and Marshall W. Nirenberg ‘for . . . the genetic code and itsfunction in protein synthesis’

    1969 Max Delbrück, Alfred D. Hershey and Salvador E. Luria ‘for their discoveries concerning thereplication mechanism and the genetic structure of viruses’

    (continued overleaf )

  • 4 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

    Table 1.1 (continued)

    Date Discoverer + Discovery1972 Christian B. Anfinsen ‘for his work on ribonuclease, especially concerning the connection between

    the amino acid sequence and the biologically active conformation’ Stanford Moore and William H.Stein ‘for their contribution to the understanding of the connection between chemical structure andcatalytic activity of . . . ribonuclease molecule’

    1972 Gerald M. Edelman and Rodney R. Porter ‘for their discoveries concerning the chemical structure ofantibodies’

    1975 John Warcup Cornforth ‘for his work on the stereochemistry of enzyme-catalyzed reactions’. VladimirPrelog ‘for his research into the stereochemistry of organic molecules and reactions’

    1975 David Baltimore, Renato Dulbecco and Howard Martin Temin ‘for their discoveries concerning theinteraction between tumour viruses and the genetic material of the cell’

    1978 Werner Arber, Daniel Nathans and Hamilton O. Smith ‘for the discovery of restriction enzymes andtheir application to problems of molecular genetics’

    1980 Paul Berg ‘for his fundamental studies of the biochemistry of nucleic acids, with particular regard torecombinant-DNA’ Walter Gilbert and Frederick Sanger ‘for their contributions concerning thedetermination of base sequences in nucleic acids’

    1982 Aaron Klug ‘development of crystallographic electron microscopy and structural elucidation ofnucleic acid–protein complexes’

    1984 Robert Bruce Merrifield ‘for his development of methodology for chemical synthesis on a solidmatrix’

    1984 Niels K. Jerne, Georges J.F. Köhler and César Milstein ‘for theories concerning the specificity indevelopment and control of the immune system and the discovery of the principle for production ofmonoclonal antibodies’

    1988 Johann Deisenhofer, Robert Huber and Hartmut Michel ‘for the determination of the structure of aphotosynthetic reaction centre’

    1989 J. Michael Bishop and Harold E. Varmus ‘for their discovery of the cellular origin of retroviraloncogenes’

    1991 Richard R. Ernst ‘for . . . the methodology of high resolution nuclear magnetic resonancespectroscopy’

    1992 Edmond H. Fischer and Edwin G. Krebs ‘for their discoveries concerning reversible proteinphosphorylation as a biological regulatory mechanism’

    1993 Kary B. Mullis ‘for his invention of the polymerase chain reaction (PCR) method’ and Michael Smith‘for his fundamental contributions to the establishment of oligonucleotide-based, site-directedmutagenesis’

    1994 Alfred G. Gilman and Martin Rodbell ‘for their discovery of G-proteins and the role of these proteinsin signal transduction’

  • THE BIOLOGICAL DIVERSITY OF PROTEINS 5

    Table 1.1 (continued)

    Date Discoverer + Discovery1997 Paul D. Boyer and John E. Walker ‘for their elucidation of the enzymatic mechanism underlying the

    synthesis of adenosine triphosphate (ATP)’. Jens C. Skou ‘for the first discovery of anion-transporting enzyme, Na+, K+-ATPase’

    1997 Stanley B. Prusiner ‘for his discovery of prions – a new biological principle of infection’

    1999 Günter Blobel ‘for the discovery that proteins have intrinsic signals that govern their transport andlocalization in the cell’

    2000 Arvid Carlsson, Paul Greengard and Eric R Kandel ‘signal transduction in the nervous system’

    2001 Paul Nurse, Tim Hunt and Leland Hartwill ‘for discoveries of key regulators of the cell cycle’

    2002 Kurt Wuthrich, ‘for development of NMR spectroscopy as a method of determining biologicalmacromolecules structure in solution.’ John B. Fenn and Koichi Tanaka ‘for their development ofsoft desorption ionization methods for mass spectrometric analyses of biological macromolecules’.Sydney Brenner, H. Robert Horvitz and John E. Sulston ‘for their discoveries concerning geneticregulation of organ development and programmed cell death’

    conversion of this information or expression into pro-teins that represents the tangible evidence of a livingsystem or life.

    DNA −→ RNA −→ protein

    Cells divide, synthesize new products, secrete unwantedproducts, generate chemical energy to sustain these pro-cesses via specific chemical reactions, and in all ofthese examples the common theme is the mediationof proteins.

    In 1944 the physicist Erwin Schrödinger posed thequestion ‘What is Life?’ in an attempt to understand thephysical properties of a living cell. Schrödinger sug-gested that living systems obeyed all laws of physicsand should not be viewed as exceptional but insteadreflected the statistical nature of these laws. Moreimportantly, living systems are amenable to study usingmany of the techniques familiar to chemistry andphysics. The last 50 years of biochemistry have demon-strated this hypothesis emphatically with tools devel-oped by physicists and chemists rapidly employed inbiological studies. A casual perusal of Table 1.1 showshow quickly methodologies progress from discovery toapplication.

    The biological diversity of proteins

    Proteins have diverse biological functions ranging fromDNA replication, forming cytoskeletal structures, trans-porting oxygen around the bodies of multicellularorganisms to converting one molecule into another.The types of functional properties are almost end-less and are continually being increased as we learnmore about proteins. Some important biological func-tions are outlined in Table 1.2 but it is to be expectedthat this rudimentary list of properties will expandeach year as new proteins are characterized. A for-mal demarcation of proteins into one class should notbe pursued too far since proteins can have multipleroles or functions; many proteins do not lend them-selves easily to classification schemes. However, forall chemical reactions occurring in cells a protein isinvolved intimately in the biological process. Theseproteins are united through their composition based onthe same group of 20 amino acids. Although all pro-teins are composed of the same group of 20 aminoacids they differ in their composition – some containa surfeit of one amino acid whilst others may lackone or two members of the group of 20 entirely.It was realized early in the study of proteins that

  • 6 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

    Table 1.2 A selective list of some functional roles for proteins within cells

    Function Examples

    Enzymes or catalytic proteins Trypsin, DNA polymerases and ligases,Contractile proteins Actin, myosin, tubulin, dynein,Structural or cytoskeletal proteins Tropocollagen, keratin,Transport proteins Haemoglobin, myoglobin, serum albumin, ceruloplasmin,

    transthyretinEffector proteins Insulin, epidermal growth factor, thyroid stimulating hormone,Defence proteins Ricin, immunoglobulins, venoms and toxins, thrombin,Electron transfer proteins Cytochrome oxidase, bacterial photosynthetic reaction centre,

    plastocyanin, ferredoxinReceptors CD4, acetycholine receptor,Repressor proteins Jun, Fos, Cro,Chaperones (accessory folding proteins) GroEL, DnaKStorage proteins Ferritin, gliadin,

    variation in size and complexity is common and themolecular weight and number of subunits (polypep-tide chains) show tremendous diversity. There is nocorrelation between size and number of polypeptidechains. For example, insulin has a relative molecu-lar mass of 5700 and contains two polypeptide chains,haemoglobin has a mass of approximately 65 000 andcontains four polypeptide chains, and hexokinase isa single polypeptide chain with an overall mass of∼100 000 (see Table 1.3).

    The molecular weight is more properly referred toas the relative molecular mass (symbol Mr). This isdefined as the mass of a molecule relative to 1/12ththe mass of the carbon (12C) isotope. The mass ofthis isotope is defined as exactly 12 atomic massunits. Consequently the term molecular weight orrelative molecular mass is a dimensionless quantityand should not possess any units. Frequently in thisand many other textbooks the unit Dalton (equivalentto 1 atomic mass unit, i.e. 1 Dalton = 1 amu) is usedand proteins are described with molecular weights of5.5 kDa (5500 Daltons). More accurately, this is theabsolute molecular weight representing the mass ingrams of 1 mole of protein. For most purposes thisbecomes of little relevance and the term ‘molecular

    Table 1.3 The molecular masses of proteins togetherwith the number of subunits. The term ‘subunit’ issynonymous with the number of polypeptide chainsand is used interchangeably

    Protein Molecularmass

    Subunits

    Insulin 5700 2Haemoglobin 64 500 4Tropocollagen 285 000 3Subtilisin 27 500 1Ribonuclease 12 600 1Aspartate

    transcarbamoylase310 000 12

    Bacteriorhodopsin 26 800 1Hexokinase 102 000 1

    weight’ is used freely in protein biochemistry and inthis book.

    Proteins are joined covalently and non-covalentlywith other biomolecules including lipids, carbohydrates,

  • THE BIOLOGICAL DIVERSITY OF PROTEINS 7

    nucleic acids, phosphate groups, flavins, heme groupsand metal ions. Components such as hemes or metalions are often called prosthetic groups. Complexesformed between lipids and proteins are lipoproteins,those with carbohydrates are called glycoproteins,whilst complexes with metal ions lead to metallo-proteins, and so on. The complexes formed betweenmetal ions and proteins increases the involvement ofelements of the periodic table beyond that expectedof typical organic molecules (namely carbon, hydro-gen, nitrogen and oxygen). Inspection of the periodictable (Figure 1.2) shows that at least 20 elements havebeen implicated directly in the structure and functionof proteins (Table 1.4). Surprisingly elements such asaluminium and silicon that are very abundant in theEarth’s crust (8.1 and 25.7 percent by weight, respec-tively) do not occur in high concentration within cells.Aluminium is rarely, if ever, found as part of proteins

    whilst the role of silicon is confined to biomineralizationwhere it is the core component of shells. The involve-ment of carbon, hydrogen, oxygen, nitrogen, phospho-rus and sulfur is clear although the role of other ele-ments, particularly transition metals, has been difficultto establish. Where transition metals occur in proteinsthere is frequently only one metal atom per mole of pro-tein and led in the past to a failure to detect metal. Otherelements have an inferred involvement from growthstudies showing that depletion from the diet leads toan inhibition of normal cellular function. For metallo-proteins the absence of the metal can lead to a loss ofstructure and function.

    Metals such as Mo, Co and Fe are often foundassociated with organic co-factors such as pterin,flavins, cobalamin and porphyrin (Figure 1.3). Theseorganic ligands hold metal centres and are often tightlyassociated to proteins.

    Table 1.4 The involvement of trace elements in the structure and function of proteins

    Element Functional role

    Sodium Principal intracellular ion, osmotic balancePotassium Principal intracellular ion, osmotic balanceMagnesium Bound to ATP/GTP in nucleotide binding proteins, found as structural component of

    hydrolase and isomerase enzymesCalcium Activator of calcium binding proteins such as calmodulinVanadium Bound to enzymes such as chloroperoxidase.Manganese Bound to pterin co-factor in enzymes such as xanthine oxidase or sulphite oxidase. Also

    found in nitrogenase and as component of water splitting enzyme in higher plants.Iron Important catalytic component of heme enzymes involved in oxygen transport as well as

    electron transfer. Important examples are haemoglobin, cytochrome oxidase andcatalase.

    Cobalt Metal component of vitamin B12 found in many enzymes.Nickel Co-factor found in hydrogenase enzymesCopper Involved as co-factor in oxygen transport systems and electron transfer proteins such as

    haemocyanin and plastocyanin.Zinc Catalytic component of enzymes such as carbonic anhydrase and superoxide dismutase.Chlorine Principal intracellular anion, osmotic balanceIodine Iodinated tyrosine residues form part of hormone thyroxine and bound to proteinsSelenium Bound at active centre of glutathione peroxidase

  • 8 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

    Per

    iodi

    c ta

    ble

    of th

    e ch

    emic

    al e

    lem

    ents

    and

    thei

    r in

    volv

    emen

    t with

    pro

    tein

    s

    1 2

    3 4

    5 6

    7 8

    9 10

    11

    12

    13

    14

    15

    16

    17

    18

    1s

    1

    H

    Hyd

    roge

    n

    2

    He

    Hel

    ium

    s

    blo

    ck

    p b

    lock

    2s

    3

    LiLi

    thiu

    m

    4

    Be

    Ber

    yliu

    m

    5

    B

    Bor

    on

    6

    C

    Car

    bon

    7

    N

    Nitr

    ogen

    8

    O

    Oxy

    gen

    9

    F

    Flu

    orin

    e

    10 N

    e N

    eon

    3s

    2p 3p11

    Na

    Sod

    ium

    12 M

    gM

    agne

    sium

    3p

    d b

    lock

    (tr

    ansi

    tio

    n m

    etal

    s)

    13 A

    l A

    lum

    iniu

    m

    14

    Si

    Sili

    con

    15

    P

    Pho

    spho

    rus

    16

    S

    Sul

    fur

    17 C

    l C

    hlor

    ine

    18 A

    r A

    rgon

    4s

    19

    K

    Pot

    assi

    um

    20 C

    a C

    alc i

    um

    3 d 21

    Sc

    Sca

    ndiu

    m

    22

    Ti

    Tita

    nium

    23

    V

    Va

    nadi

    um 2

    4

    Cr

    Chr

    o miu

    m

    25 M

    n M

    anga

    nese

    26 F

    e Iro

    n

    27 C

    o C

    obal

    t

    28

    Ni

    Nic

    kel

    29 C

    u C

    oppe

    r

    30

    Zn

    Zin

    c

    31 G

    a G

    alli u

    m

    32 G

    e G

    erm

    aniu

    m

    33 A

    s A

    rsen

    ic

    34 S

    e S

    e len

    ium

    35 B

    r B

    rom

    ine

    36 K

    r K

    ryp t

    on5s

    37

    Rb

    Rub

    idiu

    m 3

    8

    Sr

    Str

    ontiu

    m

    4 d 39

    Y

    Yttr

    ium

    40

    Zr

    Zirc

    oniu

    m

    41 N

    b N

    iobi

    um

    42

    Mo

    Mol

    ybde

    num

    43 T

    c T

    echn

    etiu

    m 44

    Ru

    Rut

    heni

    um 4

    5

    Rh

    Rho

    dium

    46

    Pd

    Pal

    ladi

    um 4

    7

    Ag

    Silv

    er

    48

    Cd

    Cad

    miu

    m

    49

    InIn

    dium

    50 S

    n T

    in

    51 S

    b A

    ntim

    ony

    52 T

    e Te

    lluriu

    m

    53

    I Io

    dine

    54 X

    e X

    enon

    6s

    55

    Cs

    Cae

    sium

    56 B

    a B

    ariu

    m

    5 d 71

    Lu

    Lute

    tium

    72

    Hf

    Haf

    nium

    73 T

    a T

    anta

    lum

    74

    W

    Tun

    gste

    n

    75 R

    e R

    heni

    um

    76 O

    s O

    smiu

    m

    77

    IrIri

    dium

    78

    Pt

    Pla

    tinum

    79 A

    u G

    old

    80

    Hg

    Mer

    cury

    81

    Tl

    Tellu

    rium

    82 P

    b Le

    ad

    83 B

    i B

    ism

    uth

    84 P

    o P

    olo

    nium

    85 A

    t A

    stat

    ine

    86 R

    n R

    adon

    7s

    87

    Fr

    Fra

    nciu

    m

    88 R

    a R

    adiu

    m

    6 d 10

    3

    LrLa

    wre

    nciu

    m

    Met

    al ↔

    No

    n-m

    etal

    s

    f bl

    ock

    (la

    nth

    anid

    es a

    nd

    act

    inid

    es)

    4f

    57 L

    aLa

    ntha

    num

    58

    Ce

    Cer

    ium

    59

    Pr

    Pra

    esod

    ymiu

    m

    60

    Nd

    Neo

    dym

    ium

    61

    Pm

    P

    rom

    ethi

    um 6

    2

    Sm

    S

    amar

    ium

    63

    Eu

    Eur

    opiu

    m

    64

    Gd

    Gad

    olin

    ium

    65

    Tb

    Terb

    ium

    66

    Dy

    Dys

    pros

    ium

    67

    Ho

    Hol

    miu

    m

    68

    Er

    Erb

    ium

    69

    Tm

    T

    huliu

    m

    70

    Yb

    Ytte

    rbiu

    m5f

    89

    Ac

    Act

    iniu

    m

    90 T

    h T

    horiu

    m

    91

    Pa

    Pro

    actin

    ium

    92

    Ur

    Ura

    niu

    m

    93

    Np

    Nep

    tuni

    um 9

    4

    Pu

    Put

    oniu

    m

    95 A

    m

    Am

    eric

    um

    96 C

    m

    Cur

    ium

    97

    Bk

    Ber

    keliu

    m

    98

    Cf

    Cal

    iforn

    ium

    99

    Es

    Eis

    tein

    ium

    100

    Fm

    Fe

    rmiu

    m

    101

    Md

    Men

    dele

    vium

    102

    No

    Nob

    eliu

    m

    Figu

    re1.

    2Th

    epe

    riod

    icta

    ble

    show

    ing

    the

    elem

    ents

    high

    light

    edin

    red

    know

    nto

    have

    invo

    lvem

    ent

    inth

    est

    ruct

    ure

    and/

    orfu

    ncti

    onof

    prot

    eins

    .Th

    ein

    volv

    emen

    tof

    som

    eel

    emen

    tsis

    cont

    enti

    ous

    tung

    sten

    and

    cadm

    ium

    are

    clai

    med

    tobe

    asso

    ciat

    edw

    ith

    prot

    eins

    yet

    thes

    eel

    emen

    tsar

    eal

    sokn

    own

    tobe

    toxi

    c

  • WHY STUDY PROTEINS? 9

    N

    CNH2

    O

    R

    N

    NN

    NH2N

    O

    N

    N

    N

    N O

    O

    H3C

    H3C

    R

    P

    P

    Fe

    NN

    N N

    M

    V

    M

    M

    M

    V

    R1

    O

    M

    MgNN

    N N

    M

    CH2CH3

    M

    CH2

    M

    V

    O

    OM

    CH2

    Figure 1.3 Organic co-factors found in proteins. These co-factors are pterin, the isoalloxine ring found as part offlavin in FAD and FMN, the pyridine ring of NAD and its close analogue NADP and the porphyrin skeletons of hemeand chlorophyll. R represents the remaining part of the co-factor whilst M and V signify methyl and vinyl side chains

    Proteins and the sequencing of thehuman and other genomes

    Recognition of the diverse roles of proteins in biolog-ical systems increased largely as a result of the enor-mous amount of sequencing information generated viathe Human Genome Mapping project. Similar schemesaimed at deciphering the genomes of Escherichia coli,yeast (Sacharromyces cerevisiae), and mouse providedrelated information. With the completion of the firstdraft of the human genome mapping project in 2001human chromosomes contain approximately 25–30 000genes. This allows a conservative estimate of the num-ber of polypeptides making up most human cells as∼25 000, although alternative splicing of genes andvariations in subunit composition increase the num-ber of proteins further. Despite sequencing the humangenome it is an unfortunate fact that we do not knowthe role performed by most proteins. Of those thou-sands of polypeptides we know the structures of only asmall number, emphasizing a large imbalance between

    the abundance of sequence data and the presence ofstructure/function information. An analysis of proteindatabases suggests about 1000 distinct structures orfolds have been determined for globular proteins. Manyproteins are retained within cell membranes and weknow virtually nothing about the structures of theseproteins and only slightly more about their functionalroles. This observation has enormous consequences forunderstanding protein structure and function.

    Why study proteins?

    This question is often asked not entirely without reasonby many undergraduates during their first introductionto the subject. Perhaps the best reply that can be givenis that proteins underpin every aspect of biologicalactivity. This is particularly important in areas whereprotein structure and function have an impact onhuman endeavour such as medicine. Advances inmolecular genetics reveal that many diseases stem fromspecific protein defects. A classic example is cystic

  • 10 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

    Figure 1.4 The shape of erythrocytes in normal and sickle cell anemia arises from mutations to haemoglobin foundwithin the red blood cell. (Reproduced with permission from Voet, D, Voet, J.G and Pratt, C.W. Fundamentals ofBiochemistry. John Wiley & Sons Inc.)

    fibrosis, an inherited condition that alters a protein,called the cystic fibrosis transmembrane conductanceregulator (CFTR), involved in the transport of sodiumand chloride across epithelial cell membranes. Thisdefect is found in Caucasian populations at a ratioof ∼1 in 20, a surprisingly high frequency. With 1in 20 of the population ‘carrying’ a single defectivecopy of the gene individuals who inherit defectivecopies of the gene from each parent suffer from thedisease. In the UK the incidence of cystic fibrosis isapproximately 1 in 2000 live births, making it oneof the most common inherited disorders. The diseaseresults in the body producing a thick, sticky mucusthat blocks the lungs, leading to serious infection, andinhibits the pancreas, stopping digestive enzymes fromreaching the intestines where they are required to digestfood. The severity of cystic fibrosis is related to CFTRgene mutation, and the most common mutation, foundin approximately 65 percent of all cases, involves thedeletion of a single amino acid residue from the proteinat position 508. A loss of one residue out of a totalof nearly 1500 amino acid residues results in a severedecrease in the quality of life with individuals sufferingfrom this disease requiring constant medical care andsupervision.

    Further examples emphasize the need to understandmore about proteins. The pioneering studies of VernonIngram in the 1950s showed that sickle cell anemiaarose from a mutation in the β chain of haemoglobin.Haemoglobin is a tetrameric protein containing 2αand 2β chains. In each of the β chains a mutation

    is found that involves the change of the sixth aminoacid residue from a glutamic acid to a valine. Thealteration of two residues out of 574 leads to a drasticchange in the appearance of red blood cells from theirnormal biconcave disks to an elongated sickle shape(Figure 1.4).

    As the name of the disease suggests individualsare anaemic showing decreased haemoglobin contentin red blood cells from approximately 15 g per100 ml to under half that figure, and show frequentillness. Our understanding of cystic fibrosis and ofsickle cell anaemia has advanced in parallel withour understanding of protein structure and functionalthough at best we have very limited and crude meansof treating these diseases.

    However, perhaps the greatest impetus to understandprotein structure and function lies in the hope ofovercoming two major health issues confronting theworld in the 21st century. The first of these is cancer.Cancer is the uncontrolled proliferation of cells thathave lost their normal regulated cell division often inresponse to a genetic or environmental trigger. Thedevelopment of cancer is a multistep, multifactorialprocess often occurring over decades but the preciseinvolvement of specific proteins has been demonstratedin some instances. One of the best examples is aprotein called p53, normally present at low levels incells, that ‘switches on’ in response to cellular damageand as a transcription factor controls the cell cycleprocess. Mutations in p53 alter the normal cycle ofevents leading eventually to cancer and several tumours

  • WHY STUDY PROTEINS? 11

    including lung, colorectal and skin carcinomas areattributed to molecular defects in p53. Future researchon p53 will enable its physicochemical properties tobe thoroughly appreciated and by understanding thelink between structure, folding, function and regulationcomes the prospect of unravelling its role in tumourformation and manipulating its activity via therapeuticintervention. Already some success is being achievedin this area and the future holds great promise for‘halting’ cancer by controlling the properties of p53and similar proteins.

    A second major problem facing the world todayis the estimated number of people infected with thehuman immunodeficiency virus (HIV). In 2003 theWorld Health Organization (WHO) estimated thatover 40 million individuals are infected with thisvirus in the world today. For many individuals,particularly those in the ‘Third World’, the prospectof prolonged good health is unlikely as the virusslowly degrades the body’s ability to fight infectionthrough damage to the immune response mechanismand in particular to a group of cells called cytotoxicT cells. HIV infection encompasses many aspects ofprotein structure and function, as the virus enters cellsthrough the interaction of specific viral coat proteinswith receptors on the surface of white blood cells. Onceinside cells the virus ‘hides’ but is secretly replicatingand integrating genetic material into host DNA throughthe action of specific enzymes (proteins). Halting thedestructive influence of HIV relies on understandingmany different, yet inter-related, aspects of proteinstructure and function. Again, considerable progresshas been made since the 1980s when the causativeagent of the disease was recognized as a retrovirus.These advances have focussed on understanding the

    structure of HIV proteins and in designing specificinhibitors of, for example, the reverse transcriptaseenzyme. Although in advanced health care systemsthese drugs (inhibitors) prolong life expectancy, theeradication of HIV’s destructive action within thebody and hence an effective cure remains unachieved.Achieving this goal should act as a timely reminderfor all students of biology, chemistry and medicinethat success in this field will have a dramatic impacton the quality of human life in the forthcomingdecades.

    Central to success in treating any of the above dis-eases are the development of new medicines, manybased on proteins. The development of new therapieshas been rapid during the last 20 years with the list ofnew treatments steadily increasing and including min-imizing serious effects of different forms of cancer viathe use of specific proteins including monoclonal anti-bodies, alleviating problems associated with diabetesby the development of improved recombinant ‘insulins’and developing ‘clot-busting’ drugs (proteins) for themanagement of strokes and heart attacks. This highlyselective list is the productive result of understandingprotein structure and function and has contributed toa marked improvement in disease management. Forthe future these advances will need to be extendedto other diseases and will rely on an extensive andthorough knowledge of proteins of increasing size andcomplexity. We will need to understand the structureof proteins, their interaction with other biomolecules,their roles within different biological systems and theirpotential manipulation by genetic or chemical meth-ods. The remaining chapters in this book represent anattempt to introduce and address some of these issuesin a fundamental manner helpful to students.

  • 2Amino acids:

    the building blocks of proteins

    Despite enormous functional diversity all proteins con-sist of a linear arrangement of amino acid residuesassembled together into a polypeptide chain. Aminoacids are the ‘building blocks’ of proteins and in orderto understand the properties of proteins we must firstdescribe the properties of the constituent 20 aminoacids. All amino acids contain carbon, hydrogen, nitro-gen and oxygen with two of the 20 amino acidsalso containing sulfur. Throughout this book a colourscheme based on the CPK model (after Corey, Paulingand Kultun, pioneers of ‘space-filling’ representationsof molecules) is used. This colouring scheme showsnitrogen atoms in blue, oxygen atoms in red, carbonatoms are shown in light grey (occasionally black), sul-fur is shown in yellow, and hydrogen, when shown, iseither white, or to enhance viewing on a white back-ground, a lighter shade of grey. To avoid unnecessarycomplexity ‘ball and stick’ representations of molecu-lar structures are often shown instead of space-fillingmodels. In other instances cartoon representations ofstructure are shown since they enhance visualization oforganization whilst maintaining clarity of presentation.

    The 20 amino acids found in proteinsIn their isolated state amino acids are white crystallinesolids. It is surprising that crystalline materials form the

    building blocks for proteins since these latter moleculesare generally viewed as ‘organic’. The crystallinenature of amino acids is further emphasized by theirhigh melting and boiling points and together theseproperties are atypical of most organic molecules.Organic molecules are not commonly crystalline nor dothey have high melting and boiling points. Compare,for example, alanine and propionic acid – the formeris a crystalline amino acid and the other is a volatileorganic acid. Despite similar molecular weights (89and 74) their respective melting points are 314 ◦Cand −20.8 ◦C. The origin of these differences and theunique properties of amino acids resides in their ionicand dipolar nature.

    Amino acids are held together in a crystallinelattice by charged interactions and these relativelystrong forces contribute to high melting and boilingpoints. Charge groups are also responsible for electricalconductivity in aqueous solutions (amino acids areelectrolytes), their relatively high solubility in waterand the large dipole moment associated with crystallinematerial. Consequently amino acids are best viewedas charged molecules that crystallize from solutionscontaining dipolar ions. These dipolar ions are calledzwitterions. A proper representation of amino acidsreflects amphoteric behaviour and amino acids arealways represented as the zwitterionic state in this

    Proteins: Structure and Function by David Whitford 2005 John Wiley & Sons, Ltd

  • 14 AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS

    C

    O

    O−H

    CH3N+

    R

    Figure 2.1 A skeletal model of a generalized aminoacid showing the amino (blue) carboxyl (red) and Rgroups attached to a central or α carbon

    textbook as opposed to the undissociated form. For19 of the twenty amino acids commonly found inproteins a general structure for the zwitterionic state hascharged amino (NH3+) and carboxyl (COO−) groupsattached to a central carbon atom called the α carbon.The remaining atoms connected to the α carbon are asingle hydrogen atom and the R group or side chain(Figure 2.1).

    The acid–base properties of aminoacidsAt pH 7 the amino and carboxyl groups are chargedbut over a pH range from 1 to 14 these groupsexhibit a series of equilibria involving binding anddissociation of a proton. The binding and dissociationof a proton reflects the role of these groups as weakacids or weak bases. The acid–base behaviour ofamino acids is important since it influences the eventualproperties of proteins, permits methods of identificationfor different amino acids and dictates their reactivity.The amino group, characterized by a basic pK valueof approximately 9, is a weak base. Whilst the aminogroup ionizes around pH 9.0 the carboxyl groupremains charged until a pH of ∼2.0 is reached. Atthis pH a proton binds neutralizing the charge of thecarboxyl group. In each case the carboxyl and aminogroups ionize according to the equilibrium

    HA + H2O −→ H3O+ + A− (2.1)where HA, the proton donor, is either –COOH or–NH3+ and A− the proton acceptor is either –COO−or –NH2. The extent of ionization depends on theequilibrium constant

    K = [H+][A−]/[HA] (2.2)

    and it becomes straightforward to derive the relation-ship

    pH = pK + log[A−]/[HA] (2.3)known as the Henderson–Hasselbalch equation (seeappendix). For a simple amino acid such as alaninea biphasic titration curve is observed when a solutionof the amino acid (a weak acid) is titrated withsodium hydroxide (a strong base). The titration curveshows two zones where the pH changes very slowlyafter additions of small amounts of acid or alkali(Figure 2.2). Each phase reflects different pK valuesassociated with ionizable groups.

    During the titration of alanine different ionic speciespredominate in solution (Figure 2.3). At low pH (