PROTEINS · Secondary structure 39 Tertiary structure 50 Quaternary structure 62 The globin family and the role of quaternary structure in modulating activity 66 Immunoglobulins 74

PROTEINSSTRUCTURE AND FUNCTION

David Whitford

John Wiley & Sons, Ltd

Innodata0470012412.jpg


David Whitford

John Wiley & Sons, Ltd

Copyright 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or byany means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright,Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham CourtRoad, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressedto the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ,England, or emailed to [email protected], or faxed to (+44) 1243 770620.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold onthe understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expertassistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that appearsin print may not be available in electronic books.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 0-471-49893-9 HBISBN 0-471-49894-7 PB

Typeset in 10/12pt Times by Laserwords Private Limited, Chennai, IndiaPrinted and bound by Graphos SpA, Barcelona, SpainThis book is printed on acid-free paper responsibly manufactured from sustainable forestryin which at least two trees are planted for each one used for paper production.

http://www.wiley.com

For my parents,

Elizabeth and Percy Whitford,

to whom I owe everything

Contents

Preface xi

1 An Introduction to protein structure and function 1A brief and very selective historical perspective 1The biological diversity of proteins 5Proteins and the sequencing of the human and other genomes 9Why study proteins? 9

2 Amino acids: the building blocks of proteins 13The 20 amino acids found in proteins 13The acid–base properties of amino acids 14Stereochemical representations of amino acids 15Peptide bonds 16The chemical and physical properties of amino acids 23Detection, identification and quantification of amino acids and proteins 32Stereoisomerism 34Non-standard amino acids 35Summary 36Problems 37

3 The three-dimensional structure of proteins 39Primary structure or sequence 39Secondary structure 39Tertiary structure 50Quaternary structure 62The globin family and the role of quaternary structure in modulating activity 66Immunoglobulins 74Cyclic proteins 81Summary 81Problems 83

4 The structure and function of fibrous proteins 85The amino acid composition and organization of fibrous proteins 85Keratins 86Fibroin 92Collagen 92Summary 102Problems 103

viii CONTENTS

5 The structure and function of membrane proteins 105The molecular organization of membranes 105Membrane protein topology and function seen through organization of the

erythrocyte membrane 110Bacteriorhodopsin and the discovery of seven transmembrane helices 114The structure of the bacterial reaction centre 123Oxygenic photosynthesis 126Photosystem I 126Membrane proteins based on transmembrane β barrels 128Respiratory complexes 132Complex III, the ubiquinol-cytochrome c oxidoreductase 132Complex IV or cytochrome oxidase 138The structure of ATP synthetase 144ATPase family 152Summary 156Problems 159

6 The diversity of proteins 161Prebiotic synthesis and the origins of proteins 161Evolutionary divergence of organisms and its relationship to protein

structure and function 163Protein sequence analysis 165Protein databases 180Gene fusion and duplication 181Secondary structure prediction 181Genomics and proteomics 183Summary 187Problems 187

7 Enzyme kinetics, structure, function, and catalysis 189Enzyme nomenclature 191Enzyme co-factors 192Chemical kinetics 192The transition state and the action of enzymes 195The kinetics of enzyme action 197Catalytic mechanisms 202Enzyme structure 209Lysozyme 209The serine proteases 212Triose phosphate isomerase 215Tyrosyl tRNA synthetase 218EcoRI restriction endonuclease 221Enzyme inhibition and regulation 224Irreversible inhibition of enzyme activity 227Allosteric regulation 231Covalent modification 237Isoenzymes or isozymes 241Summary 242Problems 244

CONTENTS ix

8 Protein synthesis, processing and turnover 247Cell cycle 247The structure of Cdk and its role in the cell cycle 250Cdk–cyclin complex regulation 252DNA replication 253Transcription 254Eukaryotic transcription factors: variation on a ‘basic’ theme 261The spliceosome and its role in transcription 265Translation 266Transfer RNA (tRNA) 267The composition of prokaryotic and eukaryotic ribosomes 269A structural basis for protein synthesis 272An outline of protein synthesis 273Antibiotics provide insight into protein synthesis 278Affinity labelling and RNA ‘footprinting’ 279Structural studies of the ribosome 279Post-translational modification of proteins 287Protein sorting or targeting 293The nuclear pore assembly 302Protein turnover 303Apoptosis 310Summary 310Problems 312

9 Protein expression, purification and characterization 313The isolation and characterization of proteins 313Recombinant DNA technology and protein expression 313Purification of proteins 318Centrifugation 320Solubility and ‘salting out’ and ‘salting in’ 323Chromatography 326Dialysis and ultrafiltration 333Polyacrylamide gel electrophoresis 333Mass spectrometry 340How to purify a protein? 342Summary 344Problems 345

10 Physical methods of determining the three-dimensional structure ofproteins 347Introduction 347The use of electromagnetic radiation 348X-ray crystallography 349Nuclear magnetic resonance spectroscopy 360Cryoelectron microscopy 375Neutron diffraction 379Optical spectroscopic techniques 379Vibrational spectroscopy 387Raman spectroscopy 389

x CONTENTS

ESR and ENDOR 390Summary 392Problems 393

11 Protein folding in vivo and in vitro 395Introduction 395Factors determining the protein fold 395Factors governing protein stability 403Folding problem and Levinthal’s paradox 403Models of protein folding 408Amide exchange and measurement of protein folding 411Kinetic barriers to refolding 412In vivo protein folding 415Membrane protein folding 422Protein misfolding and the disease state 426Summary 435Problems 437

12 Protein structure and a molecular approach to medicine 439Introduction 439Sickle cell anaemia 441Viruses and their impact on health as seen through structure and function 442HIV and AIDS 443The influenza virus 457p53 and its role in cancer 470Emphysema and α1-antitrypsin 475Summary 478Problems 479

Epilogue 481

Glossary 483

Appendices 491

Bibliography 495

References 499

Index 511

Preface

When I first started studying proteins as an undergrad-uate I encountered for the first time complex areas ofbiochemistry arising from the pioneering work of Paul-ing, Sumner, Kendrew, Perutz, Anfinsen, together withother scientific ‘giants’ too numerous to describe atlength in this text. The area seemed complete. Howwrong I was and how wrong an undergraduate’s per-ception can be! The last 30 years have seen an explo-sion in the area of protein biochemistry so that my 1975edition of Biochemistry by Albert Lehninger remains,perhaps, of historical interest only. The greatest changehas occurred through the development of molecularbiology where fragments of DNA are manipulated inways previously unimagined. This has enabled DNAto be sequenced, cloned, manipulated and expressedin many different cells. As a result areas of recom-binant DNA technology and protein engineering haveevolved rapidly to become specialist disciplines in theirown right. Almost any protein whose primary sequenceis known can be produced in large quantity via theexpression of cloned or synthetic genes in recombinanthost cells. Not only is the method allowing scien-tists to study some proteins for the first time but theincreased amount of protein derived from recombinantDNA technology is also allowing the application ofnew and continually advancing structural techniques.In this area X-ray crystallography has remained at theforefront for over 40 years as a method of determin-ing protein structure but it is now joined by nuclearmagnetic resonance (NMR) spectroscopy and morerecently by cryoelectron microscopy whilst other meth-ods such as circular dichroism, infrared and Ramanspectroscopy, electron spin resonance spectroscopy,mass spectrometry and fluorescence provide more lim-ited, yet often vital and complementary, structural data.In many instances these methods have become estab-lished techniques only in the last 20 years and are

consequently absent in many of those familiar text-books occupying the shelves of university libraries.

An even greater impact on biochemistry hasoccurred with the rapid development of cost-effective,powerful, desktop computers with performance equiv-alent to the previous generation of supercomput-ers. Many experimental techniques relied on the co-development of computer hardware but software hasalso played a vital role in protein biochemistry. Wecan now search databases comparing proteins at thelevel of DNA or amino acid sequences, building uppatterns of homology and relationships that provideinsight into origin and possible function. In additionwe use computers routinely to calculate properties suchas isoelectric point, number of hydrophobic residues orsecondary structure – something that would have beenextraordinarily tedious, time consuming and problem-atic 20 years ago. Computers have revolutionized allaspects of protein biochemistry and there is little doubtthat their influence will continue to increase in theforthcoming decades. The new area of bioinformaticsreflects these advances in computing.

In my attempt to construct an introductory yet exten-sive text on proteins I have, of necessity, been circum-spect in my description of the subject area. I have oftenrelied on qualitative rather than quantitative descrip-tions and I have attempted to minimise the introduc-tion of unwieldy equations or formulae. This doesnot reflect my own interests in physical biochemistrybecause my research, I hope, was often quantitative.In some cases particularly the chapters on enzymesand physical methods the introduction of equations isunavoidable but also necessary to an initial descrip-tion of the content of these chapters. I would be failingin my duty as an educator if I omitted some of theseequations and I hope students will keep going at these‘difficult’ points or failing that just omit them entirely

xii PREFACE

on first reading this book. However, in general I wishto introduce students to proteins by describing princi-ples governing their structure and function and to avoidover-complication in this presentation through rigorousand quantitative treatment. This book is firmly intendedto be a broad introductory text suitable for undergrad-uate and postgraduate study, perhaps after an initialexposure to the subject of protein biochemistry, whilstat the same time introducing specialist areas prior tofuture advanced study. I hope the following chapterswill help to direct students to the amazing beauty andcomplexity of protein systems.

Target audienceThe present text should be suitable for all introductorymodes of biochemistry, molecular biology, chemistry,medicine and dentistry. In the UK this generally meansthe book is suitable for all undergraduates betweenyears 1 and 3 and this book has stemmed from lecturesgiven as parts of biochemistry courses to students ofbiochemistry, chemistry, medicine and dentistry in all3 years. Where possible each chapter is structuredto increase progressively in complexity. For purelyintroductory courses as would occur in years 1 or 2it is sufficient to read only the first parts, or selectedsections, of each chapter. More advanced courses mayrequire thorough reading of each chapter together withconsultation of the bibliography and secondly the listof references given at the end of the book.

The world wide webIn the last ten years the world wide web (WWW)has transformed information available to students. Itprovides a new and useful medium with which todeliver lecture notes and an exciting and new teach-ing resource for all. Consequently within this bookURLs direct students to learning resources and a listof important addresses is included in the appendix.In an effort to exploit the power of the internetthis book is associated with ‘web-based’ tutorials,problems and content and is accessed from the followingURL http://www.wiley.com/go/whitfordproteins. These‘pages’ are continually updated and point the interestedreader towards new areas as they emerge. The Bibli-ography points interested readers towards further study

material suitable for a first introduction to a subjectwhilst the list of references provides original sourcesfor many areas covered in each of the twelve chapters.

For the problems included at the end of each chapterthere are approximately 10 questions that aim to buildon the subject matter discussed in the preceding text.Often the questions will increase in difficulty althoughthis is not always the case. In this book I have limitedthe bibliography to broad reviews or accessible journalpapers and I have deliberately restricted the number of‘high-powered’ (difficult!) articles since I believe thisorganization is of greater use to students studying thesesubjects for the first time. To aid the learning processthe web edition has multiple-choice questions for useas a formative assessment exercise. I should certainlylike to hear of all mistakes or omissions encounteredin this text and my hope is that educators and studentswill let me know via the e-mail address at the end ofthis section of any required corrections or additions.

Proteins are three-dimensional (3D) objects that areinadequately represented on book pages. Consequentlymany proteins are best viewed as molecular imagesusing freely available software. Here, real-time manip-ulation of coordinate files is possible and will provehelpful to understanding aspects of structure and func-tion. The importance of viewing, manipulating andeven changing the representation of proteins to com-prehending structure and function cannot be underesti-mated. Experience has suggested that the use of com-puters in this area can have a dramatic effect on stu-dent’s understanding of protein structures. The abilityto visualize in 3D conveys so much information – farmore than any simple 2D picture in this book couldever hope to portray. Alongside many figures I havewritten the Protein DataBank files (e.g. PDB: 1HKO)used to produce diagrams. These files can be obtainedfrom databases at several permanent sites based aroundthe world such as http://www.rscb.org/pdb or one ofthe many ‘mirrors’ that exist (for example, in theUK this data is found at http://pdb.ccdc.cam.ac.uk).For students with Internet access each PDB file canbe retrieved and manipulated independently to pro-duce comparable images to those shown in the text.To explore these macromolecular images with reason-able efficiency does not require the latest ‘all-powerful’desktop computer. A computer with a Pentium III (orlater) based processor, a clock speed of 200 MHz or

PREFACE xiii

greater, 32–64 MB RAM, hard disks of 10 GB, agraphics video card with at least 8 MB memory anda connection to the internet are sufficient to view andstore a significant number of files together with rep-resentative images. Of course things are easier with acomputer with a surfeit of memory (>256 MB) anda high ‘clock’ speed (>2 GHz) but it is not obliga-tory to see ‘on-line’ content or to manipulate molecularimages. This book was started on a 700 MHz PentiumIII based processor equipped with 256 MB RAM and16 MB graphics card.

Organization of this bookThis book will address the structure and function ofproteins in 12 subsequent chapters each with a defini-tive theme. After an initial chapter describing why onewould wish to study proteins and a brief historicalbackground the second chapter deals with the ‘buildingblocks’ of proteins, namely the amino acids togetherwith their respective chemical and physical proper-ties. No attempt is made at any point to describe themetabolism connected with these amino acids and thereader should consult general textbooks for descriptionsof the synthesis and degradation of amino acids. Thisis a major area in its own right and would have length-ened the present book too much. However, I wouldlike to think that students will not avoid these areasbecause they remain an equally important subject thatshould be covered at some point within the under-graduate curriculum. Chapter 3 covers the assemblyof amino acids into polypeptide chains and levels oforganizational structure found within proteins. Almostall detailed knowledge of protein structure and func-tion has arisen through studies of globular proteins butthe presence of fibrous proteins with different struc-tures and functional properties necessitated a separatechapter devoted to this area (Chapter 4). Within thisclass the best understood structures are those belongingto the collagen class of proteins, the keratins and theextended β sheet structures such as silk fibroin. Thedivision between globular proteins and fibrous proteinswas made at a time when the only properties one couldcompare readily were a protein’s amino acid compo-sition and hydrodynamic radius. It is now apparentthat other proteins exist with properties intermediatebetween globular and fibrous proteins that do not lend

themselves to simple classification. However, the ‘old’schemes of identification retain their value and serveto emphasize differences in proteins.

Membrane proteins represent a third group withdifferent composition and properties. Most of theseproteins are poorly understood, but there have beenspectacular successes from the initial low-resolutionstructure of bacteriorhodopsin to the highly definedstructure of bacterial photosynthetic reaction centres.These advances paved the way towards structuralstudies of G proteins and G-protein coupled receptors,the respiratory complexes from aerobic bacteria and thestructure of ATP synthetases.

Chapter 6 focuses both on experimental and com-putational methods of comparing proteins where insilico methods have become increasingly important asa vital tool to assist with modern protein biochemistry.Chapter 7 focuses on enzymes and by discussing basicreaction rate theories and kinetics the chapter leads toa discussion of enzyme-catalysed reactions. Enzymescatalyse reactions through a variety of mechanismsincluding acid–base catalysis, nucleophilic drivenchemistry and transition state stabilization. These andother mechanisms are described along with the princi-ples of regulation, active site chemistry and binding.

The involvement of proteins in the cell cycle,transcription, translation, sorting and degradation ofproteins is described in Chapter 8. In 50 years wehave progressed from elucidating the structure ofDNA to uncovering how this information is convertedinto proteins. The chapter is based around the struc-ture of two macromolecular systems: the ribosomedevoted towards accurate and efficient synthesis andthe proteasome designed to catalyse specific proteoly-sis. Chapter 9 deals with the methods of protein purifi-cation. Very often, biochemistry textbooks describetechniques without placing the technique in the correctcontext. As a result, in Chapter 9 I have attempted todescribe equipment as well as techniques so that stu-dents may obtain a proper impression of this area.

Structural methods determine the topology or foldof proteins. With an elucidation of structure at atomiclevels of resolution comes an understanding of bio-logical function. Chapter 10 addresses this area bydescribing different techniques. X-ray crystallographyremains at the forefront of research with new variationsof the basic principle allowing faster determination of

xiv PREFACE

structure at improved resolution. NMR methods yieldstructures of comparable resolution to crystallographyfor small soluble proteins. In ideal situations thesemethods provide complete structural determination ofall heavy atoms but they are complemented by otherspectroscopic methods such as absorbance and fluores-cence methods, mass spectrometry and infrared spec-troscopy. These techniques provide important ancillaryinformation on tertiary structure such as the helical con-tent of the protein, the proportion and environment ofaromatic residues within a protein as well as secondarystructure content.

Chapter 11 describes protein folding and stabil-ity – a subject that has generated intense research inter-est with the recognition that disease states arise fromaberrant folding or stability. The mechanism of proteinfolding is illustrated by in vitro and in vivo studies.Whilst the broad concepts underlying protein fold-ing were deduced from studies of ‘model’ proteinssuch as ribonuclease, analysis of cell folding path-ways has highlighted specialised proteins, chaperones,with a critical function to the overall process. TheGroES–GroEL complex is discussed to highlight theintegrated process of synthesis and folding in vivo.

The final chapter builds on the preceding 11 chaptersusing a restricted set of well-studied proteins (casestudies) with significant impact on molecular medicine.These proteins include haemoglobin, viral proteins,p53, prions and α1-antitrypsin. Although still a youngsubject area this branch of protein science will expandin the next few years and will rely on the techniques,knowledge and principles elucidated in Chapters 1–11.The examples emphasize the impact of protein scienceand molecular medicine on the quality of human life.

AcknowledgementsI am indebted to all research students and post-docswho shared my laboratories at the Universities of Lon-don and Oxford during the last 15 years in many casesacting as ‘test subjects’ for teaching ideas. I shouldlike to thank Drs Roger Hewson, Richard Newboldand Susan Manyusa whose comments throughout myresearch and teaching career were always valued. Iwould also like to thank individuals, too numerousto name, with whom I interacted at King’s CollegeLondon, Imperial College of Science, Technology and

Medicine and the University of Oxford. In this con-text I should like to thank Dr John Russell, formerlyof Imperial College London whose goodwill, humourand fantastic insight into the history of science, thescientific method and ‘day to day’ experimentation pre-vented absolute despair.

During preparation of this book many individu-als read and contributed valuable comments to themanuscript’s content, phrasing and ideas. In particular Iwish to thank these unnamed and some times unknownindividuals who read one or more of the chapters of thisbook. As is often said by most authors at this pointdespite their valuable contributions all of the remain-ing errors and deficiencies in the current text are myresponsibility. In this context I could easily have spentmore months attempting to perfect the current text.I am very aware that this text has deficiencies but Ihope these defects will not detract from its value. Inaddition my wish to try other avenues, other roads nottaken, dictates that this manuscript is completed with-out delay.

Writing and producing a textbook would not bepossible without the support of a good publisher. Ishould like to thank all the staff at John Wiley & Sons,Chichester, UK. This exhaustive list includes particu-larly Andrew Slade as senior Publishing Editor whohelped smooth the bumpy route towards production ofthis book, Lisa Tickner who first initiated events lead-ing to commissioning this book, Rachel Ballard whosupervised day to day business on this book, replacingevery form I lost without complaint and monitoringtactfully and gently about possible completion dates,Robert Hambrook who translated my text and diagramsinto a beautiful book, and the remainder of the pro-duction team of John Wiley and Sons. Together weinched our way towards the painfully slow productionof this text, although the pace was entirely attributableto the author.

Lastly I must also thank Susan who tolerated theprotracted completion of this book, reading chaptersand offering support for this project throughout whilstcoping with the arrival of Alexandra and Ethaneffortlessly (unlike their father).

David WhitfordApril 2004

[email protected]

1An Introduction to protein structure

and function

Biochemistry has exploded as a major scientificendeavour over the last one hundred years to rival pre-viously established disciplines such as chemistry andphysics. This occurred with the recognition that livingsystems are based on the familiar elements of organicchemistry (carbon, oxygen, nitrogen and hydrogen)together with the occasional involvement of inorganicchemistry and elements such as iron, copper, sodium,potassium and magnesium. More importantly the lawsof physics including those concerning thermodynam-ics, electricity and quantum physics are applicable tobiochemical systems and no ‘vital’ force distinguishesliving from non-living systems. As a result the lawsof chemistry and physics are successfully applied tobiochemistry and ideas from physics and chemistryhave found widespread application, frequently revolu-tionizing our understanding of complex systems suchas cells.

This book focuses on one major component of allliving systems – the proteins. Proteins are found inall living systems ranging from bacteria and virusesthrough the unicellular and simple eukaryotes tovertebrates and higher mammals such as humans.Proteins make up over 50 percent of the dry weightof cells and are present in greater amounts thanany other biomolecule. Proteins are unique amongstthe macromolecules in underpinning every reaction

occurring in biological systems. It goes without sayingthat one should not ignore the other components ofliving systems since they have indispensable roles, butin this text we will consider only proteins.

A brief and very selective historicalperspective

With the vast accumulation of knowledge about pro-teins over the last 50 years it is perhaps surprising todiscover that the term protein was introduced nearly170 years ago. One early description was by GerhardusJohannes Mulder in 1839 where his studies on the com-position of animal substances, chiefly fibrin, albuminand gelatin, showed the presence of carbon, hydro-gen, oxygen and nitrogen. In addition he recognizedthat sulfur and phosphorus were present sometimes in‘animal substances’ that contained large numbers ofatoms. In other words, he established that these ‘sub-stances’ were macromolecules. Mulder communicatedhis results to Jöns Jakob Berzelius and it is suggestedthe term protein arose from this interaction where theorigin of the word protein has been variously ascribedto derivation from the Latin word primarius or fromthe Greek god Proteus. The definition of proteins wastimely since in 1828 Friedrich Wöhler had shown that

Proteins: Structure and Function by David Whitford 2005 John Wiley & Sons, Ltd

2 AN INTRODUCTION TO PROTEIN STRUCTURE AND FUNCTION

(NH4)OCN C

O

H2N NH2

Figure 1.1 The decomposition of ammonium cyanateyields urea

heating ammonium cyanate resulted in isomerism andthe formation of urea (Figure 1.1). Organic compoundscharacteristic of living systems, such as urea, couldbe derived from simple inorganic chemicals. For manyhistorians this marks the beginning of biochemistry andit is appropriate that the discovery of proteins occurredat the same period.

The development of biochemistry and the study ofproteins was assisted by analysis of their compositionand structure by Heinrich Hlasiwetz and Josef Haber-mann around 1873 and the recognition that proteinswere made up of smaller units called amino acids.They established that hydrolysis of casein with strongacids or alkali yielded glutamic acid, aspartic acid,leucine, tyrosine and ammonia whilst the hydrolysisof other proteins yielded a different group of products.Importantly their work suggested that the properties ofproteins depended uniquely on the constituent parts – atheme that is equally relevant today in modern bio-chemical study.

Another landmark in the study of proteins occurredin 1902 with Franz Hofmeister establishing the con-stituent atoms of the peptide bond with the polypep-tide backbone derived from the condensation of freeamino acids. Five years earlier Eduard Buchner rev-olutionized views of protein function by demonstrat-ing that yeast cell extracts catalysed fermentation ofsugar into ethanol and carbon dioxide. Previously itwas believed that only living systems performed thiscatalytic function. Emil Fischer further studied biolog-ical catalysis and proposed that components of yeast,which he called enzymes, combined with sugar to pro-duce an intermediate compound. With the realizationthat cells were full of enzymes 100 years of researchhas developed and refined these discoveries. Furtherlandmarks in the study of proteins could include Sum-ner’s crystallization of the first enzyme (urease) in1926 and Pauling’s description of the geometry of the

peptide bond; however, extensive discussion of theseadvances and many other important discoveries in pro-tein biochemistry are best left to history of sciencetextbooks.

A brief look at the award of the Nobel Prizesfor Chemistry, Physiology and Medicine since 1900highlighted in Table 1.1 reveals the involvement ofmany diverse areas of science in protein biochemistry.At first glance it is not obvious why William andLawrence Bragg’s discovery of the diffraction ofX-rays by sodium chloride crystals is relevant, butdiffraction by protein crystals is the main route towardsbiological structure determination. Their discovery wasthe first step in the development of this technique.Discoveries in chemistry and physics have beenimplemented rapidly in the study of proteins. By 1958Max Perutz and John Kendrew had determined the firstprotein structure and this was soon followed by thelarger, multiple subunit, structure of haemoglobin andthe first enzyme, lysozyme. This remarkable advancein knowledge extended from initial understanding ofthe atomic composition of proteins around 1900 tothe determination of the three-dimensional structure ofproteins in the 1960s and represents a major chapterof modern biochemistry. However, advances havecontinued with new areas of molecular biology provingequally important to understanding protein structureand function.

Life may be defined as the ordered interactionof proteins and all forms of life from viruses tocomplex, specialized, mammalian cells are based onproteins made up of the same building blocks oramino acids. Proteins found in simple unicellularorganisms such as bacteria are identical in structureand function to those found in human cells illustratingthe evolutionary lineage from simple to complexorganisms.

Molecular biology starts with the dramatic eluci-dation of the structure of the DNA double helix byJames Watson, Francis Crick, Rosalind Franklin andMaurice Wilkins in 1953. Today, details of DNA repli-cation, transcription into RNA and the synthesis of pro-teins (translation) are extensive. This has establishedan enormous body of knowledge representing a wholenew subject area. All cells encode the information con-tent of proteins within genes, or more accurately theorder of bases along the DNA strand, yet it is the

A BRIEF AND VERY SELECTIVE HISTORICAL PERSPECTIVE 3

Table 1.1 Selected landmarks in the study of protein structure and function from 1900–2002 as seen by the awardof the Nobel Prize for Chemistry, Physiology or Medicine

Date Discoverer + Discovery1901 Wilhelm Conrad Röntgen ‘in recognition of the . . . discovery of the remarkable rays subsequently

named after him’1907 Eduard Buchner ‘cell-free fermentation’

1914 Max von Laue ‘for his discovery of the diffraction of X-rays by crystals’

1915 William Henry Bragg and William Lawrence Bragg ‘for their services in the analysis of crystalstructure by . . . X-rays’

1923 Frederick Grant Banting and John James Richard Macleod ‘for the discovery of insulin’

1930 Karl Landsteiner ‘for his discovery of human blood groups’

1946 James Batcheller Sumner ‘for his discovery that enzymes can be crystallized’.

John Howard Northrop and Wendell Meredith Stanley ‘for their preparation of enzymes and virusproteins in a pure form’

1948 Arne Wilhelm Kaurin Tiselius ‘for his research on electrophoresis and adsorption analysis, especiallyfor his discoveries concerning the complex nature of the serum proteins’

1952 Archer John Porter Martin and Richard Laurence Millington Synge ‘for their invention of partitionchromatography’

1952 Felix Bloch and Edward Mills Purcell ‘for their development of new methods for nuclear magneticprecision measurements and discoveries in connection therewith’

1954 Linus Carl Pauling ‘for his research into the nature of the chemical bond and . . . to the elucidation of. . . complex substances’

1958 Frederick Sanger ‘for his work on the structure of proteins, especially that of insulin’

1959 Severo Ochoa and Arthur Kornberg ‘for their discovery of the mechanisms in the biological synthesisof ribonucleic acid and deoxyribonucleic acid’

1962 Max Ferdinand Perutz and John Cowdery Kendrew ‘for their studies of the structures of globularproteins’

1962 Francis Harry Compton Crick, James Dewey Watson and Maurice Hugh Frederick Wilkins ‘for theirdiscoveries concerning the molecular structure of nucleic acids and its significance for informationtransfer in living material’

1964 Dorothy Crowfoot Hodgkin ‘for her determinations by X-ray techniques of the structures of importantbiochemical substances’

1965 François Jacob, André Lwoff and Jacques Monod ‘for discoveries concerning genetic control ofenzyme and virus synthesis’

1968 Robert W. Holley, Har Gobind Khorana and Marshall W. Nirenberg ‘for . . . the genetic code and itsfunction in protein synthesis’

1969 Max Delbrück, Alfred D. Hershey and Salvador E. Luria ‘for their discoveries concerning thereplication mechanism and the genetic structure of viruses’

(continued overleaf )


Table 1.1 (continued)

Date Discoverer + Discovery1972 Christian B. Anfinsen ‘for his work on ribonuclease, especially concerning the connection between

the amino acid sequence and the biologically active conformation’ Stanford Moore and William H.Stein ‘for their contribution to the understanding of the connection between chemical structure andcatalytic activity of . . . ribonuclease molecule’

1972 Gerald M. Edelman and Rodney R. Porter ‘for their discoveries concerning the chemical structure ofantibodies’

1975 John Warcup Cornforth ‘for his work on the stereochemistry of enzyme-catalyzed reactions’. VladimirPrelog ‘for his research into the stereochemistry of organic molecules and reactions’

1975 David Baltimore, Renato Dulbecco and Howard Martin Temin ‘for their discoveries concerning theinteraction between tumour viruses and the genetic material of the cell’

1978 Werner Arber, Daniel Nathans and Hamilton O. Smith ‘for the discovery of restriction enzymes andtheir application to problems of molecular genetics’

1980 Paul Berg ‘for his fundamental studies of the biochemistry of nucleic acids, with particular regard torecombinant-DNA’ Walter Gilbert and Frederick Sanger ‘for their contributions concerning thedetermination of base sequences in nucleic acids’

1982 Aaron Klug ‘development of crystallographic electron microscopy and structural elucidation ofnucleic acid–protein complexes’

1984 Robert Bruce Merrifield ‘for his development of methodology for chemical synthesis on a solidmatrix’

1984 Niels K. Jerne, Georges J.F. Köhler and César Milstein ‘for theories concerning the specificity indevelopment and control of the immune system and the discovery of the principle for production ofmonoclonal antibodies’

1988 Johann Deisenhofer, Robert Huber and Hartmut Michel ‘for the determination of the structure of aphotosynthetic reaction centre’

1989 J. Michael Bishop and Harold E. Varmus ‘for their discovery of the cellular origin of retroviraloncogenes’

1991 Richard R. Ernst ‘for . . . the methodology of high resolution nuclear magnetic resonancespectroscopy’

1992 Edmond H. Fischer and Edwin G. Krebs ‘for their discoveries concerning reversible proteinphosphorylation as a biological regulatory mechanism’

1993 Kary B. Mullis ‘for his invention of the polymerase chain reaction (PCR) method’ and Michael Smith‘for his fundamental contributions to the establishment of oligonucleotide-based, site-directedmutagenesis’

1994 Alfred G. Gilman and Martin Rodbell ‘for their discovery of G-proteins and the role of these proteinsin signal transduction’

THE BIOLOGICAL DIVERSITY OF PROTEINS 5

Table 1.1 (continued)

Date Discoverer + Discovery1997 Paul D. Boyer and John E. Walker ‘for their elucidation of the enzymatic mechanism underlying the

synthesis of adenosine triphosphate (ATP)’. Jens C. Skou ‘for the first discovery of anion-transporting enzyme, Na+, K+-ATPase’

1997 Stanley B. Prusiner ‘for his discovery of prions – a new biological principle of infection’

1999 Günter Blobel ‘for the discovery that proteins have intrinsic signals that govern their transport andlocalization in the cell’

2000 Arvid Carlsson, Paul Greengard and Eric R Kandel ‘signal transduction in the nervous system’

2001 Paul Nurse, Tim Hunt and Leland Hartwill ‘for discoveries of key regulators of the cell cycle’

2002 Kurt Wuthrich, ‘for development of NMR spectroscopy as a method of determining biologicalmacromolecules structure in solution.’ John B. Fenn and Koichi Tanaka ‘for their development ofsoft desorption ionization methods for mass spectrometric analyses of biological macromolecules’.Sydney Brenner, H. Robert Horvitz and John E. Sulston ‘for their discoveries concerning geneticregulation of organ development and programmed cell death’

conversion of this information or expression into pro-teins that represents the tangible evidence of a livingsystem or life.

DNA −→ RNA −→ protein

Cells divide, synthesize new products, secrete unwantedproducts, generate chemical energy to sustain these pro-cesses via specific chemical reactions, and in all ofthese examples the common theme is the mediationof proteins.

In 1944 the physicist Erwin Schrödinger posed thequestion ‘What is Life?’ in an attempt to understand thephysical properties of a living cell. Schrödinger sug-gested that living systems obeyed all laws of physicsand should not be viewed as exceptional but insteadreflected the statistical nature of these laws. Moreimportantly, living systems are amenable to study usingmany of the techniques familiar to chemistry andphysics. The last 50 years of biochemistry have demon-strated this hypothesis emphatically with tools devel-oped by physicists and chemists rapidly employed inbiological studies. A casual perusal of Table 1.1 showshow quickly methodologies progress from discovery toapplication.

The biological diversity of proteins

Proteins have diverse biological functions ranging fromDNA replication, forming cytoskeletal structures, trans-porting oxygen around the bodies of multicellularorganisms to converting one molecule into another.The types of functional properties are almost end-less and are continually being increased as we learnmore about proteins. Some important biological func-tions are outlined in Table 1.2 but it is to be expectedthat this rudimentary list of properties will expandeach year as new proteins are characterized. A for-mal demarcation of proteins into one class should notbe pursued too far since proteins can have multipleroles or functions; many proteins do not lend them-selves easily to classification schemes. However, forall chemical reactions occurring in cells a protein isinvolved intimately in the biological process. Theseproteins are united through their composition based onthe same group of 20 amino acids. Although all pro-teins are composed of the same group of 20 aminoacids they differ in their composition – some containa surfeit of one amino acid whilst others may lackone or two members of the group of 20 entirely.It was realized early in the study of proteins that


Table 1.2 A selective list of some functional roles for proteins within cells

Function Examples

Enzymes or catalytic proteins Trypsin, DNA polymerases and ligases,Contractile proteins Actin, myosin, tubulin, dynein,Structural or cytoskeletal proteins Tropocollagen, keratin,Transport proteins Haemoglobin, myoglobin, serum albumin, ceruloplasmin,

transthyretinEffector proteins Insulin, epidermal growth factor, thyroid stimulating hormone,Defence proteins Ricin, immunoglobulins, venoms and toxins, thrombin,Electron transfer proteins Cytochrome oxidase, bacterial photosynthetic reaction centre,

plastocyanin, ferredoxinReceptors CD4, acetycholine receptor,Repressor proteins Jun, Fos, Cro,Chaperones (accessory folding proteins) GroEL, DnaKStorage proteins Ferritin, gliadin,

variation in size and complexity is common and themolecular weight and number of subunits (polypep-tide chains) show tremendous diversity. There is nocorrelation between size and number of polypeptidechains. For example, insulin has a relative molecu-lar mass of 5700 and contains two polypeptide chains,haemoglobin has a mass of approximately 65 000 andcontains four polypeptide chains, and hexokinase isa single polypeptide chain with an overall mass of∼100 000 (see Table 1.3).

The molecular weight is more properly referred toas the relative molecular mass (symbol Mr). This isdefined as the mass of a molecule relative to 1/12ththe mass of the carbon (12C) isotope. The mass ofthis isotope is defined as exactly 12 atomic massunits. Consequently the term molecular weight orrelative molecular mass is a dimensionless quantityand should not possess any units. Frequently in thisand many other textbooks the unit Dalton (equivalentto 1 atomic mass unit, i.e. 1 Dalton = 1 amu) is usedand proteins are described with molecular weights of5.5 kDa (5500 Daltons). More accurately, this is theabsolute molecular weight representing the mass ingrams of 1 mole of protein. For most purposes thisbecomes of little relevance and the term ‘molecular

Table 1.3 The molecular masses of proteins togetherwith the number of subunits. The term ‘subunit’ issynonymous with the number of polypeptide chainsand is used interchangeably

Protein Molecularmass

Subunits

Insulin 5700 2Haemoglobin 64 500 4Tropocollagen 285 000 3Subtilisin 27 500 1Ribonuclease 12 600 1Aspartate

transcarbamoylase310 000 12

Bacteriorhodopsin 26 800 1Hexokinase 102 000 1

weight’ is used freely in protein biochemistry and inthis book.

Proteins are joined covalently and non-covalentlywith other biomolecules including lipids, carbohydrates,

THE BIOLOGICAL DIVERSITY OF PROTEINS 7

nucleic acids, phosphate groups, flavins, heme groupsand metal ions. Components such as hemes or metalions are often called prosthetic groups. Complexesformed between lipids and proteins are lipoproteins,those with carbohydrates are called glycoproteins,whilst complexes with metal ions lead to metallo-proteins, and so on. The complexes formed betweenmetal ions and proteins increases the involvement ofelements of the periodic table beyond that expectedof typical organic molecules (namely carbon, hydro-gen, nitrogen and oxygen). Inspection of the periodictable (Figure 1.2) shows that at least 20 elements havebeen implicated directly in the structure and functionof proteins (Table 1.4). Surprisingly elements such asaluminium and silicon that are very abundant in theEarth’s crust (8.1 and 25.7 percent by weight, respec-tively) do not occur in high concentration within cells.Aluminium is rarely, if ever, found as part of proteins

whilst the role of silicon is confined to biomineralizationwhere it is the core component of shells. The involve-ment of carbon, hydrogen, oxygen, nitrogen, phospho-rus and sulfur is clear although the role of other ele-ments, particularly transition metals, has been difficultto establish. Where transition metals occur in proteinsthere is frequently only one metal atom per mole of pro-tein and led in the past to a failure to detect metal. Otherelements have an inferred involvement from growthstudies showing that depletion from the diet leads toan inhibition of normal cellular function. For metallo-proteins the absence of the metal can lead to a loss ofstructure and function.

Metals such as Mo, Co and Fe are often foundassociated with organic co-factors such as pterin,flavins, cobalamin and porphyrin (Figure 1.3). Theseorganic ligands hold metal centres and are often tightlyassociated to proteins.

Table 1.4 The involvement of trace elements in the structure and function of proteins

Element Functional role

Sodium Principal intracellular ion, osmotic balancePotassium Principal intracellular ion, osmotic balanceMagnesium Bound to ATP/GTP in nucleotide binding proteins, found as structural component of

hydrolase and isomerase enzymesCalcium Activator of calcium binding proteins such as calmodulinVanadium Bound to enzymes such as chloroperoxidase.Manganese Bound to pterin co-factor in enzymes such as xanthine oxidase or sulphite oxidase. Also

found in nitrogenase and as component of water splitting enzyme in higher plants.Iron Important catalytic component of heme enzymes involved in oxygen transport as well as

electron transfer. Important examples are haemoglobin, cytochrome oxidase andcatalase.

Cobalt Metal component of vitamin B12 found in many enzymes.Nickel Co-factor found in hydrogenase enzymesCopper Involved as co-factor in oxygen transport systems and electron transfer proteins such as

haemocyanin and plastocyanin.Zinc Catalytic component of enzymes such as carbonic anhydrase and superoxide dismutase.Chlorine Principal intracellular anion, osmotic balanceIodine Iodinated tyrosine residues form part of hormone thyroxine and bound to proteinsSelenium Bound at active centre of glutathione peroxidase


Per

iodi

c ta

ble

of th

e ch

emic

al e

lem

ents

and

thei

r in

volv

emen

t with

pro

tein

s

1 2

3 4

5 6

7 8

9 10

11

12

13

14

15

16

17

18

1s

1

H

Hyd

roge

n

2

He

Hel

ium

s

blo

ck

p b

lock

2s

3

LiLi

thiu

m

4

Be

Ber

yliu

m

5

B

Bor

on

6

C

Car

bon

7

N

Nitr

ogen

8

O

Oxy

gen

9

F

Flu

orin

e

10 N

e N

eon

3s

2p 3p11

Na

Sod

ium

12 M

gM

agne

sium

3p

d b

lock

(tr

ansi

tio

n m

etal

s)

13 A

l A

lum

iniu

m

14

Si

Sili

con

15

P

Pho

spho

rus

16

S

Sul

fur

17 C

l C

hlor

ine

18 A

r A

rgon

4s

19

K

Pot

assi

um

20 C

a C

alc i

um

3 d 21

Sc

Sca

ndiu

m

22

Ti

Tita

nium

23

V

Va

nadi

um 2

4

Cr

Chr

o miu

m

25 M

n M

anga

nese

26 F

e Iro

n

27 C

o C

obal

t

28

Ni

Nic

kel

29 C

u C

oppe

r

30

Zn

Zin

c

31 G

a G

alli u

m

32 G

e G

erm

aniu

m

33 A

s A

rsen

ic

34 S

e S

e len

ium

35 B

r B

rom

ine

36 K

r K

ryp t

on5s

37

Rb

Rub

idiu

m 3

8

Sr

Str

ontiu

m

4 d 39

Y

Yttr

ium

40

Zr

Zirc

oniu

m

41 N

b N

iobi

um

42

Mo

Mol

ybde

num

43 T

c T

echn

etiu

m 44

Ru

Rut

heni

um 4

5

Rh

Rho

dium

46

Pd

Pal

ladi

um 4

7

Ag

Silv

er

48

Cd

Cad

miu

m

49

InIn

dium

50 S

n T

in

51 S

b A

ntim

ony

52 T

e Te

lluriu

m

53

I Io

dine

54 X

e X

enon

6s

55

Cs

Cae

sium

56 B

a B

ariu

m

5 d 71

Lu

Lute

tium

72

Hf

Haf

nium

73 T

a T

anta

lum

74

W

Tun

gste

n

75 R

e R

heni

um

76 O

s O

smiu

m

77

IrIri

dium

78

Pt

Pla

tinum

79 A

u G

old

80

Hg

Mer

cury

81

Tl

Tellu

rium

82 P

b Le

ad

83 B

i B

ism

uth

84 P

o P

olo

nium

85 A

t A

stat

ine

86 R

n R

adon

7s

87

Fr

Fra

nciu

m

88 R

a R

adiu

m

6 d 10

3

LrLa

wre

nciu

m

Met

al ↔

No

n-m

etal

s

f bl

ock

(la

nth

anid

es a

nd

act

inid

es)

4f

57 L

aLa

ntha

num

58

Ce

Cer

ium

59

Pr

Pra

esod

ymiu

m

60

Nd

Neo

dym

ium

61

Pm

P

rom

ethi

um 6

2

Sm

S

amar

ium

63

Eu

Eur

opiu

m

64

Gd

Gad

olin

ium

65

Tb

Terb

ium

66

Dy

Dys

pros

ium

67

Ho

Hol

miu

m

68

Er

Erb

ium

69

Tm

T

huliu

m

70

Yb

Ytte

rbiu

m5f

89

Ac

Act

iniu

m

90 T

h T

horiu

m

91

Pa

Pro

actin

ium

92

Ur

Ura

niu

m

93

Np

Nep

tuni

um 9

4

Pu

Put

oniu

m

95 A

m

Am

eric

um

96 C

m

Cur

ium

97

Bk

Ber

keliu

m

98

Cf

Cal

iforn

ium

99

Es

Eis

tein

ium

100

Fm

Fe

rmiu

m

101

Md

Men

dele

vium

102

No

Nob

eliu

m

Figu

re1.

2Th

epe

riod

icta

ble

show

ing

the

elem

ents

high

light

edin

red

know

nto

have

invo

lvem

ent

inth

est

ruct

ure

and/

orfu

ncti

onof

prot

eins

.Th

ein

volv

emen

tof

som

eel

emen

tsis

cont

enti

ous

tung

sten

and

cadm

ium

are

clai

med

tobe

asso

ciat

edw

ith

prot

eins

yet

thes

eel

emen

tsar

eal

sokn

own

tobe

toxi

c

WHY STUDY PROTEINS? 9

N

CNH2

O

R

N

NN

NH2N

O

N

N

N

N O

O

H3C

H3C

R

P

P

Fe

NN

N N

M

V

M

M

M

V

R1

O

M

MgNN

N N

M

CH2CH3

M

CH2

M

V

O

OM

CH2

Figure 1.3 Organic co-factors found in proteins. These co-factors are pterin, the isoalloxine ring found as part offlavin in FAD and FMN, the pyridine ring of NAD and its close analogue NADP and the porphyrin skeletons of hemeand chlorophyll. R represents the remaining part of the co-factor whilst M and V signify methyl and vinyl side chains

Proteins and the sequencing of thehuman and other genomes

Recognition of the diverse roles of proteins in biolog-ical systems increased largely as a result of the enor-mous amount of sequencing information generated viathe Human Genome Mapping project. Similar schemesaimed at deciphering the genomes of Escherichia coli,yeast (Sacharromyces cerevisiae), and mouse providedrelated information. With the completion of the firstdraft of the human genome mapping project in 2001human chromosomes contain approximately 25–30 000genes. This allows a conservative estimate of the num-ber of polypeptides making up most human cells as∼25 000, although alternative splicing of genes andvariations in subunit composition increase the num-ber of proteins further. Despite sequencing the humangenome it is an unfortunate fact that we do not knowthe role performed by most proteins. Of those thou-sands of polypeptides we know the structures of only asmall number, emphasizing a large imbalance between

the abundance of sequence data and the presence ofstructure/function information. An analysis of proteindatabases suggests about 1000 distinct structures orfolds have been determined for globular proteins. Manyproteins are retained within cell membranes and weknow virtually nothing about the structures of theseproteins and only slightly more about their functionalroles. This observation has enormous consequences forunderstanding protein structure and function.

Why study proteins?

This question is often asked not entirely without reasonby many undergraduates during their first introductionto the subject. Perhaps the best reply that can be givenis that proteins underpin every aspect of biologicalactivity. This is particularly important in areas whereprotein structure and function have an impact onhuman endeavour such as medicine. Advances inmolecular genetics reveal that many diseases stem fromspecific protein defects. A classic example is cystic


Figure 1.4 The shape of erythrocytes in normal and sickle cell anemia arises from mutations to haemoglobin foundwithin the red blood cell. (Reproduced with permission from Voet, D, Voet, J.G and Pratt, C.W. Fundamentals ofBiochemistry. John Wiley & Sons Inc.)

fibrosis, an inherited condition that alters a protein,called the cystic fibrosis transmembrane conductanceregulator (CFTR), involved in the transport of sodiumand chloride across epithelial cell membranes. Thisdefect is found in Caucasian populations at a ratioof ∼1 in 20, a surprisingly high frequency. With 1in 20 of the population ‘carrying’ a single defectivecopy of the gene individuals who inherit defectivecopies of the gene from each parent suffer from thedisease. In the UK the incidence of cystic fibrosis isapproximately 1 in 2000 live births, making it oneof the most common inherited disorders. The diseaseresults in the body producing a thick, sticky mucusthat blocks the lungs, leading to serious infection, andinhibits the pancreas, stopping digestive enzymes fromreaching the intestines where they are required to digestfood. The severity of cystic fibrosis is related to CFTRgene mutation, and the most common mutation, foundin approximately 65 percent of all cases, involves thedeletion of a single amino acid residue from the proteinat position 508. A loss of one residue out of a totalof nearly 1500 amino acid residues results in a severedecrease in the quality of life with individuals sufferingfrom this disease requiring constant medical care andsupervision.

Further examples emphasize the need to understandmore about proteins. The pioneering studies of VernonIngram in the 1950s showed that sickle cell anemiaarose from a mutation in the β chain of haemoglobin.Haemoglobin is a tetrameric protein containing 2αand 2β chains. In each of the β chains a mutation

is found that involves the change of the sixth aminoacid residue from a glutamic acid to a valine. Thealteration of two residues out of 574 leads to a drasticchange in the appearance of red blood cells from theirnormal biconcave disks to an elongated sickle shape(Figure 1.4).

As the name of the disease suggests individualsare anaemic showing decreased haemoglobin contentin red blood cells from approximately 15 g per100 ml to under half that figure, and show frequentillness. Our understanding of cystic fibrosis and ofsickle cell anaemia has advanced in parallel withour understanding of protein structure and functionalthough at best we have very limited and crude meansof treating these diseases.

However, perhaps the greatest impetus to understandprotein structure and function lies in the hope ofovercoming two major health issues confronting theworld in the 21st century. The first of these is cancer.Cancer is the uncontrolled proliferation of cells thathave lost their normal regulated cell division often inresponse to a genetic or environmental trigger. Thedevelopment of cancer is a multistep, multifactorialprocess often occurring over decades but the preciseinvolvement of specific proteins has been demonstratedin some instances. One of the best examples is aprotein called p53, normally present at low levels incells, that ‘switches on’ in response to cellular damageand as a transcription factor controls the cell cycleprocess. Mutations in p53 alter the normal cycle ofevents leading eventually to cancer and several tumours

WHY STUDY PROTEINS? 11

including lung, colorectal and skin carcinomas areattributed to molecular defects in p53. Future researchon p53 will enable its physicochemical properties tobe thoroughly appreciated and by understanding thelink between structure, folding, function and regulationcomes the prospect of unravelling its role in tumourformation and manipulating its activity via therapeuticintervention. Already some success is being achievedin this area and the future holds great promise for‘halting’ cancer by controlling the properties of p53and similar proteins.

A second major problem facing the world todayis the estimated number of people infected with thehuman immunodeficiency virus (HIV). In 2003 theWorld Health Organization (WHO) estimated thatover 40 million individuals are infected with thisvirus in the world today. For many individuals,particularly those in the ‘Third World’, the prospectof prolonged good health is unlikely as the virusslowly degrades the body’s ability to fight infectionthrough damage to the immune response mechanismand in particular to a group of cells called cytotoxicT cells. HIV infection encompasses many aspects ofprotein structure and function, as the virus enters cellsthrough the interaction of specific viral coat proteinswith receptors on the surface of white blood cells. Onceinside cells the virus ‘hides’ but is secretly replicatingand integrating genetic material into host DNA throughthe action of specific enzymes (proteins). Halting thedestructive influence of HIV relies on understandingmany different, yet inter-related, aspects of proteinstructure and function. Again, considerable progresshas been made since the 1980s when the causativeagent of the disease was recognized as a retrovirus.These advances have focussed on understanding the

structure of HIV proteins and in designing specificinhibitors of, for example, the reverse transcriptaseenzyme. Although in advanced health care systemsthese drugs (inhibitors) prolong life expectancy, theeradication of HIV’s destructive action within thebody and hence an effective cure remains unachieved.Achieving this goal should act as a timely reminderfor all students of biology, chemistry and medicinethat success in this field will have a dramatic impacton the quality of human life in the forthcomingdecades.

Central to success in treating any of the above dis-eases are the development of new medicines, manybased on proteins. The development of new therapieshas been rapid during the last 20 years with the list ofnew treatments steadily increasing and including min-imizing serious effects of different forms of cancer viathe use of specific proteins including monoclonal anti-bodies, alleviating problems associated with diabetesby the development of improved recombinant ‘insulins’and developing ‘clot-busting’ drugs (proteins) for themanagement of strokes and heart attacks. This highlyselective list is the productive result of understandingprotein structure and function and has contributed toa marked improvement in disease management. Forthe future these advances will need to be extendedto other diseases and will rely on an extensive andthorough knowledge of proteins of increasing size andcomplexity. We will need to understand the structureof proteins, their interaction with other biomolecules,their roles within different biological systems and theirpotential manipulation by genetic or chemical meth-ods. The remaining chapters in this book represent anattempt to introduce and address some of these issuesin a fundamental manner helpful to students.

2Amino acids:

the building blocks of proteins

Despite enormous functional diversity all proteins con-sist of a linear arrangement of amino acid residuesassembled together into a polypeptide chain. Aminoacids are the ‘building blocks’ of proteins and in orderto understand the properties of proteins we must firstdescribe the properties of the constituent 20 aminoacids. All amino acids contain carbon, hydrogen, nitro-gen and oxygen with two of the 20 amino acidsalso containing sulfur. Throughout this book a colourscheme based on the CPK model (after Corey, Paulingand Kultun, pioneers of ‘space-filling’ representationsof molecules) is used. This colouring scheme showsnitrogen atoms in blue, oxygen atoms in red, carbonatoms are shown in light grey (occasionally black), sul-fur is shown in yellow, and hydrogen, when shown, iseither white, or to enhance viewing on a white back-ground, a lighter shade of grey. To avoid unnecessarycomplexity ‘ball and stick’ representations of molecu-lar structures are often shown instead of space-fillingmodels. In other instances cartoon representations ofstructure are shown since they enhance visualization oforganization whilst maintaining clarity of presentation.

The 20 amino acids found in proteinsIn their isolated state amino acids are white crystallinesolids. It is surprising that crystalline materials form the

building blocks for proteins since these latter moleculesare generally viewed as ‘organic’. The crystallinenature of amino acids is further emphasized by theirhigh melting and boiling points and together theseproperties are atypical of most organic molecules.Organic molecules are not commonly crystalline nor dothey have high melting and boiling points. Compare,for example, alanine and propionic acid – the formeris a crystalline amino acid and the other is a volatileorganic acid. Despite similar molecular weights (89and 74) their respective melting points are 314 ◦Cand −20.8 ◦C. The origin of these differences and theunique properties of amino acids resides in their ionicand dipolar nature.

Amino acids are held together in a crystallinelattice by charged interactions and these relativelystrong forces contribute to high melting and boilingpoints. Charge groups are also responsible for electricalconductivity in aqueous solutions (amino acids areelectrolytes), their relatively high solubility in waterand the large dipole moment associated with crystallinematerial. Consequently amino acids are best viewedas charged molecules that crystallize from solutionscontaining dipolar ions. These dipolar ions are calledzwitterions. A proper representation of amino acidsreflects amphoteric behaviour and amino acids arealways represented as the zwitterionic state in this

Proteins: Structure and Function by David Whitford 2005 John Wiley & Sons, Ltd

14 AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS

C

O

O−H

CH3N+

R

Figure 2.1 A skeletal model of a generalized aminoacid showing the amino (blue) carboxyl (red) and Rgroups attached to a central or α carbon

textbook as opposed to the undissociated form. For19 of the twenty amino acids commonly found inproteins a general structure for the zwitterionic state hascharged amino (NH3+) and carboxyl (COO−) groupsattached to a central carbon atom called the α carbon.The remaining atoms connected to the α carbon are asingle hydrogen atom and the R group or side chain(Figure 2.1).

The acid–base properties of aminoacidsAt pH 7 the amino and carboxyl groups are chargedbut over a pH range from 1 to 14 these groupsexhibit a series of equilibria involving binding anddissociation of a proton. The binding and dissociationof a proton reflects the role of these groups as weakacids or weak bases. The acid–base behaviour ofamino acids is important since it influences the eventualproperties of proteins, permits methods of identificationfor different amino acids and dictates their reactivity.The amino group, characterized by a basic pK valueof approximately 9, is a weak base. Whilst the aminogroup ionizes around pH 9.0 the carboxyl groupremains charged until a pH of ∼2.0 is reached. Atthis pH a proton binds neutralizing the charge of thecarboxyl group. In each case the carboxyl and aminogroups ionize according to the equilibrium

HA + H2O −→ H3O+ + A− (2.1)where HA, the proton donor, is either –COOH or–NH3+ and A− the proton acceptor is either –COO−or –NH2. The extent of ionization depends on theequilibrium constant

K = [H+][A−]/[HA] (2.2)

and it becomes straightforward to derive the relation-ship

pH = pK + log[A−]/[HA] (2.3)known as the Henderson–Hasselbalch equation (seeappendix). For a simple amino acid such as alaninea biphasic titration curve is observed when a solutionof the amino acid (a weak acid) is titrated withsodium hydroxide (a strong base). The titration curveshows two zones where the pH changes very slowlyafter additions of small amounts of acid or alkali(Figure 2.2). Each phase reflects different pK valuesassociated with ionizable groups.

During the titration of alanine different ionic speciespredominate in solution (Figure 2.3). At low pH (

Documents

PROTEINS · Secondary structure 39 Tertiary structure 50 Quaternary structure 62 The globin family and the role of quaternary structure in modulating activity 66 Immunoglobulins 74