17
Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell 1,2 and Janet M. Thornton 1 * 1 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridgeshire CB10 1SD UK 2 Department of Biochemistry and Molecular Biology University College London Gower Street, London WC1E 6BT, UK The phenomenon of molecular recognition, which underpins almost all biological processes, is dynamic, complex and subtle. Establishing an interaction between a pair of molecules involves mutual structural rearrangements guided by a highly convoluted energy landscape, the accurate mapping of which continues to elude us. Increased understanding of the degree to which the conformational space of a ligand is restricted upon binding may have important implications for docking studies, structure refinement and for function prediction methods based on geometrical comparisons of ligands or their binding sites. Here, we present an analysis of the conformational variability exhibited by three of the most ubiquitous biological ligands in nature, ATP, NAD and FAD. First, we demonstrate qualitatively that these ligands bind to proteins in widely varying conformations, including several cases in which parts of the molecule assume energetically unfavourable orientations. Next, by comparing the distribution of bound ligand shapes with the set of all possible molecular conformations, we provide a quantitative assessment of previous observations that ligands tend to unfold when binding to proteins. We show that, while extended forms of ligands are indeed common in ligand–protein structures, instances of ligands in almost maximally compact arrangements can also be found. Thirdly, we compare the conformational variation in two sets of ligand molecules, those bound to homologous proteins, and those bound to unrelated proteins. Although most superfamilies bind ligands in a fairly conserved manner, we find several cases in which significant variation in ligand configuration is observed. q 2005 Elsevier Ltd. All rights reserved. Keywords: ligand; conformational diversity; molecular recognition; super- family; protein *Corresponding author Introduction Although small in comparison to proteins, many biological ligands are capable of considerable conformational variability. It is clear that the combinatorial effect of varying just a few rotable bonds leads to a large number of possibilities: if we assume that any one rotable bond can exist in three distinct rotational states, then a molecule with only ten such bonds can theoretically adopt 3 10 Z59,049 different conformations. Of course, many of these will be physically impossible due to steric clashes between atoms. Still others may be strongly disfavoured energetically, but a large number may have an energy not too distant from the global minimum, meaning that a considerable region of conformational space is potentially available for exploration by small organic molecules. Upon binding to a protein, the conformational freedom of a ligand is typically restricted to a small locale, which is usually distinct from the optimum conformation of the solvated molecule, and in many cases may not even be close to a local energy minimum. 1,2 In some cases however, ligands retain considerable mobility even when complexed with proteins. 3 It must be remembered that while molecular recognition is thought to be driven primarily by enthalpic change, and therefore characterised by the formation of specific inter- actions between the protein and the immobilised ligand, the favourable entropic effect of maintaining some ligand flexibility can in some cases 0022-2836/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. Abbreviations used: CSD, Cambridge Structural Database; GR, glutathione reductase; FR, ferredoxin reductase; PDB, Protein Data Bank; NAD nicotinamide adenine dinucleotide; FAD, flavin adenine dinucleotide; DT, diptheria toxin. E-mail address of the corresponding author: [email protected] doi:10.1016/j.jmb.2005.12.012 J. Mol. Biol. (2006) 356, 928–944

Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

doi:10.1016/j.jmb.2005.12.012 J. Mol. Biol. (2006) 356, 928–944

Conformational Diversity of Ligands Bound to Proteins

Gareth R. Stockwell1,2 and Janet M. Thornton1*

1European BioinformaticsInstitute, Wellcome TrustGenome Campus, HinxtonCambridgeshire CB10 1SDUK

2Department of Biochemistryand Molecular BiologyUniversity College LondonGower Street, London WC1E6BT, UK

0022-2836/$ - see front matter q 2005 E

Abbreviations used: CSD, CambrDatabase; GR, glutathione reductasreductase; PDB, Protein Data Bank;nicotinamide adenine dinucleotide;dinucleotide; DT, diptheria toxin.

E-mail address of the [email protected]

The phenomenon of molecular recognition, which underpins almost allbiological processes, is dynamic, complex and subtle. Establishing aninteraction between a pair of molecules involves mutual structuralrearrangements guided by a highly convoluted energy landscape, theaccurate mapping of which continues to elude us. Increased understandingof the degree to which the conformational space of a ligand is restrictedupon binding may have important implications for docking studies,structure refinement and for function prediction methods based ongeometrical comparisons of ligands or their binding sites. Here, we presentan analysis of the conformational variability exhibited by three of the mostubiquitous biological ligands in nature, ATP, NAD and FAD. First, wedemonstrate qualitatively that these ligands bind to proteins in widelyvarying conformations, including several cases in which parts of themolecule assume energetically unfavourable orientations. Next, bycomparing the distribution of bound ligand shapes with the set of allpossible molecular conformations, we provide a quantitative assessment ofprevious observations that ligands tend to unfold when binding toproteins. We show that, while extended forms of ligands are indeedcommon in ligand–protein structures, instances of ligands in almostmaximally compact arrangements can also be found. Thirdly, we comparethe conformational variation in two sets of ligand molecules, those boundto homologous proteins, and those bound to unrelated proteins. Althoughmost superfamilies bind ligands in a fairly conserved manner, we findseveral cases in which significant variation in ligand configuration isobserved.

q 2005 Elsevier Ltd. All rights reserved.

Keywords: ligand; conformational diversity; molecular recognition; super-family; protein

*Corresponding author

Introduction

Although small in comparison to proteins, manybiological ligands are capable of considerableconformational variability. It is clear that thecombinatorial effect of varying just a few rotablebonds leads to a large number of possibilities: if weassume that any one rotable bond can exist in threedistinct rotational states, then a molecule with onlyten such bonds can theoretically adopt 310Z59,049different conformations. Of course, many of thesewill be physically impossible due to steric clashes

lsevier Ltd. All rights reserve

idge Structurale; FR, ferredoxinNADFAD, flavin adenine

ing author:

between atoms. Still others may be stronglydisfavoured energetically, but a large number mayhave an energy not too distant from the globalminimum, meaning that a considerable region ofconformational space is potentially available forexploration by small organic molecules.

Upon binding to a protein, the conformationalfreedom of a ligand is typically restricted to a smalllocale, which is usually distinct from the optimumconformation of the solvated molecule, and in manycases may not even be close to a local energyminimum.1,2 In some cases however, ligands retainconsiderable mobility even when complexed withproteins.3 It must be remembered that whilemolecular recognition is thought to be drivenprimarily by enthalpic change, and thereforecharacterised by the formation of specific inter-actions between the protein and the immobilisedligand, the favourable entropic effect of maintainingsome ligand flexibility can in some cases

d.

Page 2: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

Conformational Diversity of Ligands 929

compensate for weaker interactions between thetwo molecules.

Ligand conformation has been analysed pre-viously from a number of different perspectives.In one of the earliest large-scale analyses of ligandconformation, Moodie & Thornton4 compared thecoordinates of protein-bound nucleotides to thosein an unbound dataset obtained from theCambridge Structural Database (CSD).5 The mainfinding was that although most rotable bonds didnot change significantly between bound andunbound states, larger differences were observedfor a minority of bonds. These rotations tended toresult in the nucleotides changing from folded toextended conformations upon binding to proteins.The authors suggest that the relatively smallnumber of torsional changes which occur uponligand binding are evidence that bound ligandsexist predominantly in low-energy conformations,since similar conformations in solution have beenshown both theoretically and experimentally to bein low-energy states.6

This assumption does not hold for ligands ingeneral, however, according to a recent study whichassessed the energetic effects of ligand reorgani-sation upon binding.7 The energy calculationsperformed here claim1,2 that ligands rarely bind intheir lowest-energy conformation. Surprisingly,however, this study also found that in only aboutone-third of cases were the bound conformationswithin 0.5 kcal molK1 of a “local” energy minimum.Moreover, no correlation was found between globalstrain energy of the bound ligands and either thenumber of polar interactions, which they made withtheir proteins (contradicting the study of Nicklauset al.1), nor the binding affinity. These resultssuggest that even quite energetically costly ligandrearrangements can be tolerated without penalisingbinding affinity. The authors also report that theligands with the highest strain energies tended to bethose that unfolded to the greatest degree uponbinding. This finding, taken together with the lackof correlation between strain energy and number ofhydrogen bonds, indicates that the primary stabili-sing factor offered to ligands by their binding sites isthat of a generally hydrophobic environment whichprotects the uncovered non-polar ligand regionsfrom exposure to solvent.

A comparison of the binding sites for the redoxcofactors NAD and NADP showed that, althoughthey are structurally similar, these two moleculesexhibit some noticeable differences in their inter-actions with proteins.8 The authors report a total of13 different conformations adopted by the twocofactors: eight of them by NADP and five by NAD.Interestingly, the two different compounds werenever found in the same conformational cluster,indicating that even an apparently small chemicalelaboration can significantly alter the constraintsacting on the shape of a particular molecule. Therelationship between the conformational clustering,and evolutionary similarities between the proteinsis mentioned only in passing; the authors note that

proteins with similar folds and/or functions tend tobind the cofactor in a similar conformation, but donot investigate this in detail.

An analysis of the sequence and structure of 32families of FAD-binding proteins has beenreported.9 The authors comment, in passing, onthe conformation of the cofactors bound to eachprotein. They find that, while in some structuralfamilies (namely the p-cresol methylhydroxylaseand pyruvate oxidase proteins), the cofactor con-formation is essentially fixed, other groups (theglutathione reductase (GR) and ferredoxinreductase (FR) families) contain proteins whichbind FAD in quite varied arrangements. This studydoes not compare the conformational variance ofFAD “within” families to that observed “between”families.

Hansen and co-workers published a study of therelationship between protein sequence similarityand NAD cofactor conformation.10 Comparingclusters of sequences of NAD(P)-utilising enzymesfrom the SWISS-PROT database11 with the confor-mations of bound cofactors, the main resultreported is that each sequence family bindsNAD(P) molecules in conformations which clustertogether. This is not surprising, since the identifi-cation of protein relationships from sequence aloneimplies that more than 30% of their residues areidentical, which in turn implies that function isconserved, and hence the cofactor should beexpected to bind in the same way. Althoughrelationships between higher level protein simi-larities (i.e. superfamily or fold groupings) andcofactor conformation are not discussed at length,the overall finding is that members of the largestructural families (Rossmann fold oxidoreductasesand flavin-NAD(P)-coupled enzymes) both bindtheir cofactors in several different conformations,while the smaller families each map to just a singleconformational cluster.

Increased understanding of the conformationalchanges which ligands undergo upon binding toproteins is attractive for both academic andpragmatic reasons. Appreciation of the energeticconstraints acting on bound ligands can shed lighton how they perform their particular biologicalfunctions, be these as enzymatic cofactors, signal-ling molecules or labile substrates. In particular,strain induced in a ligand as a consequence of theshape which it is forced by the protein binding it toadopt may promote its participation in chemicalreactions. Alternatively, conformational changeupon binding may expose a reactive atom orfunctional group which would otherwise be inac-cessible to other reactants.

From a practical standpoint, the investigation ofbound ligand conformations may prove valuable inthe development of docking methods, wherelibraries of rotamers derived from bound ligandstructures could be used to speed up confor-mational searches by a priori elimination of someof the large number of possible conformationsmentioned previously. Furthermore, knowledge of

Page 3: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

930 Conformational Diversity of Ligands

the degree of conformational variation exhibited bybound ligands, both within and between proteinfamilies, should be useful in assessing the likelybiological relevance of a proposed bound ligandpose.

In light of the existing corpus of literature onligand conformation, we felt that one area whichhas until now been insufficiently studied is thequestion of the extent to which “homologous”proteins may bind their ligands or cofactors indifferent conformations. As discussed, such a studywould have value both in deepening the under-standing of the evolutionary constraints acting onmolecular recognition, and in providing guidanceon the permitted ranges of ligand conformationwithin any given family.

In summary, the primary motivations for thework presented here are as follows:

(1) To quantify the degree of conformationaldiversity of bound ligands, with a view to:† Assessing the scale of the problem of

predicting the cognate ligand for anuncharacterised structure: can searching berestricted to a small number of commonconformations? Similar considerations applyduring refinement of ligand coordinatesduring structure determination.

† Considering the advisability of methodswhich attempt to predict ligand bindingsite-specificity by comparison with data-bases of known sites: are ligands so flexiblethat fragment-based comparison methodsare likely to meet with more success?

† Comparing conformation variation withinhomologous superfamilies to that observedin ligands bound to unrelated proteins.

(2) To determine the extent to which ligand/cofactor conformation is correlated with proteinfunction. In the cases where such correlation isobserved, to investigate reasons why thisshould be so.

(3) To investigate whether ligands tend to bindwith their bonds in the expected, low-energyorientations, or whether they are found instrained conformations.

In order to address these questions, we analyseddatasets of bound ligand coordinates for threecommonly occurring molecules, ATP, NAD andFAD, in co-crystal structures with proteins. Adeno-sine triphosphate (ATP) consists of the nucleosideadenosine linked to three phosphate groups. Thenucleoside is made up of two components: theadenine ring and a ribose moiety. The primary roleof ATP is as the universal currency of energy; inother biological situations, ATP acts as a messengeror regulatory molecule, e.g. in allosteric regulationof aspartate transcarbamoylase.12

Nicotinamide adenine dinucleotide (NAD) andflavin adenine dinucleotide (FAD) are often used asredox cofactors. NAD is the major electron acceptorin the oxidation of fuel molecules, which occurs in

glycolysis and the citric acid cycle of cellularrespiration.13 FAD appears as an oxidising agentin a number diverse biological contexts includingDNA repair (in the photolyase protein14) andcontrol of folding (in oxidoreductin, found in theendoplasmic reticulum).

These three molecules were chosen primarily dueto their widespread occurrence in biology, meaningthat a considerable amount of structural data abouttheir interactions with proteins is available.A survey of the PDB shows that ATP, NAD andFAD interact with domains from 45, 33 and 38homologous superfamilies, respectively. The factthat such a large number of evolutionarily unre-lated proteins interact with these ligands allows usto examine the extent to which the ligands canadopt distinct conformations when they are bound.

For each ligand, all instances were clusteredbased on the evolutionary relatedness of thedomains which contact them (see Methods). Con-formational diversity was then analysed using threeapproaches. First, the overall variation exhibited byeach ligand type was assessed qualitatively byvisual inspection of superpositions of moleculesbound to unrelated binding sites. Next, confor-mational variation was analysed quantitativelyusing the molecular radius of gyration as adescriptive statistic. The degree to which confor-mational freedom is restricted upon binding wascrudely assessed by comparing the distributions ofthis parameter calculated from the bound ligandcoordinates with those obtained from all possibleconformations of each molecule.

Finally, conformational variability betweenligands bound to unrelated sites is comparedagainst that observed among ligands which arebound by homologous domains. Using twodifferent multivariate analysis techniques, weshow that, while most superfamilies bind theseligands in conformations which vary only slightlybetween different proteins, several large super-families appear to be capable of recognising theligand in more diverse ways. In some cases, this canbe rationalised on structural grounds; in others,however, the reason for this diversity is not yet clearand warrants further study.

Datasets

The protocols described in Methods were appliedto the January 2005 release of the PQS,15 which is aprocessed version of the PDB in which automatedmethods have been applied in order to predictwhether the asymmetric unit found in the crystalcorresponds to the biologically active unit. Wegenerated datasets for three ligand types: ATP,NAD and FAD. The results of these procedures aresummarised in Tables 1 and 2. Starting with a totalof 407 ATP molecules, for instance, the first stage ofclustering grouped them into 75 clusters on thebasis of the S35-level annotation of the domainsmaking up their binding sites. When the

Page 4: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

Table 1. Results of the ligand validation and clusteringprocedures

LigandNo. of

instances

No. ofsequence-

familyclusters

No. of super-family

clustersNo. of

singletons

ATP 407 75 27 16NAD 1039 86 19 11FAD 680 56 14 4

For each ligand type, the columns indicate, respectively, thenumber of molecules found in PQS, the numbers of groupsresulting from the two-stage clustering procedure (see Methods),and the number of second-stage clusters which contain only onemember.

Conformational Diversity of Ligands 931

representative sites from this stage were clusteredbased on their homologous superfamily levelannotations, a total of 27 clusters were obtained.

Inspection of the results of this second stage ofclustering shows that, for each ligand, there is onegroup with significantly more members than the

Table 2. Summary of the ligand datasets

Ligand Cluster Size Rep Domai

ATP 1 22 1E2Q 3.40.50.300 P-loop contasphate hydr

2 9 1RDQ 3.30.200.20 Phosphoryla1.10.510.10 Transferase

domain 13 9 1MJH 3.40.50.620 Rossmann f4 5 1B8A 3.30.930.10 Bira bifunct5 3 1KJ8 3.30.470.20 ATP grasp f

3.30.1490.20 –6 3 1GZ4 3.40.50.720 NAD(P)-bin

NAD 1 54 1T2D 3.40.50.720 Rossmann

2 4 1O04 3.40.605.10 Aldehyde d

3.40.309.10 Aldehyde d3 4 1NUU 3.40.50.620 Rossmann f4 4 1LW7 3.50.50.60 FAD/NAD(

5 4 1GIQ 2.30.100.10 Toxin ADP-FAD 1 19 3GRS 3.50.50.60 FAD/NAD(

2 9 1GAW 2.40.30.10 Translation

3.40.50.80 Nucleotide-3 5 1N62 3.30.465.10 Uridine dip

pyruvylgluc3.30.43.10 Reductase d

4 5 1LQT 3.40.50.720 Rossmann

5 3 1E8G 3.30.465.20 Uridine dipuvylglucosa

3.30.43.106 3 1IQR 1.10.579.10 DNA cyclob

photolyase d1.25.40.80 –

Clusters with three or more members are shown for each ligand (see TaRep is the PDB entry ID of the cluster representation. The homologoussites in each cluster are shown along with their associated CATH nfunctions, these are listed; in other cases, the range of functions is toodataset are shown in Figures 5–7.

rest. For ATP, these are the sites made up of P-loopcontaining NTP-hydrolyase domains and for bothNAD and FAD, large groups of Rossmann-foldcofactor binding domains found in classical oxido-reductase proteins dominate the datasets. At theother end of the spectrum, there are a number ofclusters for each ligand which contain only onemember; in other words, which correspond to lessevolutionarily diverse homologous superfamilies.

Results

Overall shape of bound ligands

For each ligand type, the representatives fromeach cluster were superposed upon a chosen rigidfragment of the molecule (for ATP, the adenine ring;for NAD, the nicotinamide; and for FAD, theisoalloxine group). These superpositions areshown in Figures 1–3.

ns Functions

ining nucleotide tripho-olases

Various

se kinase domain 1 Kinases(phosphotransferase)

old ligase domain Ligases; transferasesional protein domain 2 tRNA synthetasesold domain B Formyltransferase; synapsin I; biotin

carboxylase

ding Rossmann-like Malic enzyme; molybdopterin syn-thesisVarious oxidoreductases; UDP-glu-cose 4-epimerase

ehydrogenase domain 1 Aldehyde dehydrogenase; glyceral-dehyde 3-phosphate dehydrogenase

ehydrogenase domain 2old ligase domain AdenylyltransferasesP)-binding domain Glutathione reductase; ferredoxin

reductase; NADH peroxidase; dihy-drolipamide reductase

ribosyltransferase ADP-ribosyltransferasesP)-binding domain Various oxidoreductasesfactors Ferredoxin-NADPC reductase; cyto-

chrome reductase; nitric oxidesynthase

binding domainhospho-nacetylenol-osamine

Carbon monoxide dehydrogenase;nucleotide metabolism

omainsOxidoreductases FPRA; dihyropyri-midine dehydrogenase; D-amino acidoxidase

hospho-n-acetylenolpyr-mine reductase domains

Vanillyl-alcohol oxidase; cholesteroloxidase; D-lactate dehydrogenase

utane dipyrimidineomain 3

DNA photolyase

ble 1). For each cluster, Size indicates the number of members andsuperfamily level annotations of domains making up the bindingumbers. Where the proteins in a cluster have a small range oflarge to list here. EC numbers associated with each protein in the

Page 5: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

Figure 1. Superposition of the 27 ATP cluster representatives on their adenine rings (highlighted). In the second image,the gamma phosphate atoms are shown with translucent spheres, to highlight the broad range of conformations adoptedby the triphosphate tail. The key shows from which PDB entry each molecule was taken. Several particularly unusualconformations are indicated with labels on the plots themselves.

932 Conformational Diversity of Ligands

ATP

Visual inspection of Figure 1 provokes threeimmediate reactions. The first is that the confor-mational diversity exhibited by ATP is conside-rable. Indeed, it is clear that the amount ofvariation in the shapes which this molecule canadopt, particularly in its triphosphate tail, arelikely to preclude attempts to define a uniquepharmacophore describing its recognition. Thesecond is that, in agreement with previous studies,ATP is observed to adopt a generally fairly

Figure 2. Superposition of the 19 NAD cluster representashows from which PDB entry each molecule was taken. Twostacked arrangement are labelled (2BKJ and 1LW7). The cofglycosidic bond adopts an unusual strained conformation, is

extended conformation. Thirdly, we note that theN-glycosidic bond is most often found in the antiorientation, in keeping with previous studies.4,6

Although most bound ATP molecules areextended, this is not universally true: two rep-resentatives (1B8A and 3R1R) are seen to be bent tosuch a degree that the terminal phosphate atoms arealmost in van der Waals contact with the adeninering. 1B8A is a structure of aspartyl-tRNA synthase.tRNA synthase enzymes, which catalyse the specificesterification of a given amino acid to the 3 0 end of itscorresponding tRNA, can be divided into two

tives on their nicotinamide rings (highlighted). The keycases where the adenine and nicotinamide rings are in aactor in diptheria toxin (1TOX), whose nicotinamide N-also indicated.

Page 6: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

Figure 3. Superposition of the 14 FAD cluster representatives on their isoalloxine rings (highlighted). The key showsfrom which PDB entry each molecule was taken.

Conformational Diversity of Ligands 933

classes on the basis of sequence motifs.16 Theseclasses each adopt different structural folds: class Ienzymes adopt a Rossmann fold, while class IIenzymes bind ATP using an antiparallel b-sheetmotif. The conformations of ATP bound to these twoclasses (clusters 9 and 4 in the ATP dataset,respectively) are shared, from the adenine moietyto the alpha phosphate, with the beta phosphate bentback towards the adenine. This bending allowsaccess to the alpha phosphate to which the tRNAmolecule is esterified, as discussed in the workdescribing PDB entry 1B8A.17 In the case of 3R1R, astructure of ribonucleotide reductase, the bent ATPconformation may be an experimental artefact, sinceATP was soaked into the crystal rather than added tothe crystallisation liquor.18

Five of the 27 ATP cluster representatives havetheir N-glycosidic bond in the syn conformation.Since these five proteins have quite diversefunctions, it is not possible to posit any functionalreason why they should bind ATP in this way; in allcases, upon visual inspection of the individualstructures, interactions between the protein and theligand can be identified which may contribute extrastability. These include hydrogen-bonding betweenthe adenine ring and water molecules coordinatingmagnesium ions (1DY3), hydrophobic residuesstacking against the face of the adenine ring(1AYL, 1A0I) and the fact that, for two proteins(1A0I and 1HI1), the catalytic mechanism involvesthe ATP becoming covalently attached to theprotein via its ribose moiety.

NAD

Figure 2 shows the 19 representative NADmolecules, superposed on their nicotinamiderings. While most NAD molecules adopt a fairlyextended shape, the degree of conformationaldiversity apparently permitted to the cofactor isstriking. Two obvious exceptions to the tendency of

NAD molecules to bind in extended forms are thosefrom PDB entries 2BKJ and 1LW7.

2BKJ is a structure of flavin reductase from theluminous bacterium Vibrio harvey, with both NADand FMN bound at one of its active sites. Flavinreductase enzymes catalyse the reduction of flavinusing NAD(P)H as a cofactor. The substrate andcofactor are bound in a non-productive orientationin this particular structure. In PDB entry 1RZ1,however, a structure of another flavin reductaseenzyme, the NAD molecule is bound in a ring-stacked conformation, but in this case with thenicotinamide ring positioned close to an FADmolecule. The authors report bleaching of thecrystals upon addition of NAD, indicating thatflavin reduction occurs, and therefore that thefolded conformation of NAD persists during thereaction.19

1LW7 is a structure of the Haemophilus influenzaeNadR protein. This is a bifunctional enzymepossessing both NMN adenylyltransferase andribosylnicotinamide kinase activities.20 The NADmolecule in the dataset, however, is not bound at anactive site, and while it makes a number of specificcontacts with the protein, its biological significance,if any, is unclear.

Another unusually shaped NAD molecule is notas conspicuous as the ring-stacked examples: 1TOX,a structure of diptheria toxin (DT), binds NAD withthe N-glycosidic bond of its nicotinamide nucleotidein an unusual conformation. This has been pre-viously noted, and attributed to the particularcatalytic mechanism of the toxin.21 In addition tothat found in the DT structure, the NAD datasetcontains four other molecules originating from ARTenzymes. These are an iota toxin, two C3 exoenzyme-like toxins, and a rat ecto-ART. All five of thesedomains share a common NAD-binding regionconsisting of a Greek key motif, and bind NAD in asimilar conformation, with the unusual torsionaround the N-glycosidic bond being particularlywell conserved. These domains are probably dis-

Page 7: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

934 Conformational Diversity of Ligands

tantly related to the cofactor binding domain fromdiptheria toxin.

FAD

Superpositions of the 14 FAD cluster represen-tatives, aligned upon their isoalloxine rings, areshown in Figure 3. Visual inspection of the plotsshows the conformations of FAD to be somewhatmore homogeneous than those of NAD, with themain variation occurring around the ribitol–isoalloxine bond. The adenine nucleotide part ofthe molecule tends to be fairly extended, andFAD is the only molecule of the four studied herein which the adenosine N-glycosidic bond seemsto adopt the anti conformation exclusively.

In contrast to NAD, no examples of FAD are seenwhere the molecule is bent enough to allowadenine–isoalloxine stacking interactions. Themost bent conformation is that from PDB entry1IQR (DNA photolyase from Thermus thermophilus),in which the centroids of the adenine andisoalloxine groups are separated by around 6.3 A,compared with 3.6 A and 4.0 A for the inter-ringdistances in the NAD molecules from 2BKJ and1LW7, respectively. There is no clear functionalreason why the FAD in this enzyme should adoptsuch a compact conformation.

Three FAD molecules which have clearlysimilar conformations are 3GRS, 1LQT and1GPE, which are the representatives from clusters1, 4 and 12, respectively. Inspection of theclustering results shows that in all three cases,the FAD molecule is bound by a Rossmann folddomain (CATH code 3.50.50.60), whereas in 3GRS(glutathione reductase), the FAD binding site isformed by two Rossmann fold domains, in both1LQT (FprA, a mycobacterial oxidoreductase) and1GPE (glucose oxidase), a single Rossmanndomain binding the adenine is partnered with asecond domain of a different type. The resultingchange in the fraction of residues contributed bythe Rossmann fold domain causes each site to be

Table 3. Radii of gyration

Radius of gyration

Ligand Dataseta nbMini-mum

Maxi-mum m

ATP Artificial 830 3.58 5.60 4.6Representatives 27 3.75 5.11 4.5

GTP Artificial 826 3.60 5.73 4.7Representatives 3 11 3.95 5.0

NAD Artificial 3054 4.08 7.59 5.8Representatives 3 19 4.21 6.5

FAD Artificial 4992 4.38 8.44 6.4Representatives 3 14 4.97 7.6

a Artificial refers to the set of generated ligand conformations; repstage of clustering, i.e. sites composed of domains with less than 35%

b For artificially generated conformations, this is the number whicc The Shapiro–Wilks test, as implemented in the R23 package e107d Wilcoxon 2-sample test, applied to determine whether the medi

placed in a different cluster, although a humanexpert would classify them all as belonging to theglutathione reductase family.9

Radii of gyration

Having discussed several qualitative aspects ofthe bound conformations of the four ligands, wenow proceed to analyse the degree of molecularcompactness of each molecule, in order to quantifyto what extent the general principal of ligandextension upon binding is realised in each case.The degree of extension was measured by calcu-lating the radius of gyration for each molecule,defined as:

rg Z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPniZ1

mijjxiK�xjj2

PniZ1

mi

vuuuuutwhere xi and mi are the position and mass,respectively, of the ith atom. For each ligand type,two distributions of rg values were calculated: onefrom the bound ligand coordinates, and onecalculated from a set of artificially generatedconformations which exhaustively sample confor-mation space (see Methods). Table 3 shows statisticswhich describe the distributions of rg for eachligand type, in both the “real” (protein-bound) andartificially generated datasets. Figure 4(a)–(c) showhistograms of these distributions. Structures ofmolecules from the dataset which illustrate themost compact and extended protein-bound formsof each ligand which were observed, as well as anexample of a molecule with average extension, areincluded in each plot.

Comparison of the ranges of rg values for theartificial and bound ligand conformations showsthat, for all four ligand types, there are instances ofthe molecule which bind proteins in almost amaximally compact conformation. For ATP, the

rg (A) Test of normalityc

s W pp (Wilcoxon

test)d

3 0.40 0.99 1.13!10K6 0.649 0.35 0.94 9.87!10K2

1 0.42 0.99 1.06!10K5 0.247 4.59 0.36 0.93 4.49!10-16 0.59 1.00 1.39!10K8 0.173 5.63 0.66 0.92 1.28!10K1

3 0.69 0.99 6.38!10K14 0.092 6.76 0.68 0.90 9.63!10K2

resentatives refers to the set of representatives from the secondsequence identity (see Methods).

h had no steric clashes.1.ans of the artifical and level-3 distributions differ significantly.

Page 8: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

3.5

– 4

4 –

4.5

4.5

– 5

5 –

5.5

5.5

– 60

0.1

0.2

0.3

Fra

ctio

n of

mol

ecu

les

Radius of gyration /A

1G5T [1] (3.70) 1KJ8 [5*] (4.53) 1A60 [2] (5.16)

RepresentativesArtificial

4 –

4.5

4.5

– 5

5 –

5.5

5.5

– 6

6 –

6.5

6.5

– 7

7 –

7.5

7.5

– 8

0

0.1

0.2

0.3 RepresentativesArtificial

Radius of gyration /A

Fra

ctio

n of

mol

ecul

es

1LW7 [3] (4.21) 1S7G [10*] (5.54) 1EJ2 [3] (6.60)

4 –

4.5

4.5

– 5

5 –

5.5

5.5

– 6

6 –

6.5

6.5

– 7

7 –

7.5

7.5

– 8

8 –

8.5

0

0.1

0.2

0.3 Representatives

Artificial

Radius of gyration /A

Fra

ctio

n of

mol

ecul

es

1DNP [6] (4.92) 1JR8 [10*] (5.98) 1NG4 [1] (7.66)

(a) ATP NAD(b)

(c) FAD

Figure 4. Distributions of radii of gyration. The distribution of the radius of gyration among cluster representatives iscompared against that obtained from the artificially generated conformations for each ligand type. Molecules from thedataset which represent the extrema and average of the distribution are shown. The cluster to which each moleculebelongs is shown in square brackets; an asterisk indicates that the molecule is a cluster representative. The radius ofgyration for each compound (in (a)) is given in parentheses.

Conformational Diversity of Ligands 935

difference between the most tightly folded mole-cules in the generated and observed datasets seemsto be that in the former, the N-glycosidic bond is inthe syn conformation. While this does occur in thebound dataset, it tends to be accompanied by afairly extended triphosphate tail; it is likely that thecombination of the syn arrangement with a tightly

bent triphosphate tail is energetically unfavourable,explaining its absence from the bound dataset.

At the other end of the distributions, there is alarger difference between the artificial and bounddatasets, particularly for NAD and FAD. It appearsthat, while ligands often bind in extended confor-mations, these are not the “most” extended

Page 9: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

936 Conformational Diversity of Ligands

arrangements which they are capable of adopting.One explanation for this may be that this fullyextended arrangement prevents coordination ofmetal ions by consecutive phosphate moieties,which is a commonly observed motif in complexesof proteins with nucleotides and nucleotide deriva-tives.

Neither the rg values for the artificially generatedconformations nor those for the bound ligandconformations are normally distributed, as shownby a Shapiro–Wilks test. This test measures thelikelihood that the samples are drawn fromnormally distributed populations;22 the p-valuesshown in Table 3 show that, in most cases, thishypothesis is rejected with at least 90% confidence.

Since the distributions are not normal, using at-test to determine whether their means differsignificantly is not appropriate. Instead, a Wilcoxon2-sample test (two-tailed) was applied. The resultsindicate that, for all ligands except FAD, themedians of the two distributions are not signifi-cantly different, and therefore conflicts with pre-vious assertions that these ligands bind inpredominantly extended conformations. Rather, itseems that the bound conformations of ATP, GTPand NAD are drawn at random from the pool ofpossible conformations available to these mole-cules. When interpreting these findings, however, itmust be remembered that the number of data pointsis quite small in all cases, and that increasedamounts of data would allow for more substantialconclusions to be drawn.

An interesting observation which arises fromcomparison of the radii of gyration for ATP andGTP with those of NAD and FAD is that, for thelatter pair of ligands, a slightly bimodal distributionis observed. Again, this should be treated withcaution in light of the sparsity of data, but it appearsthat the most folded conformations of NAD andFAD are truly outliers, rather than being simply thelower extrema of smooth distributions.

General observations on the overall shape ofbound ligands

In summary of the analysis of the overall shape ofbound ligands, we find that, in conflict withprevious studies, their degree of extension is notsignificantly different from the average degree ofextension of all possible conformations. Indeed,rather than binding in exclusively extendedarrangements, a small proportion of ligands appearto bind in very compact conformations; in somecases, these results may be due to biologicallyinactive complexes, but in others, genuinefunctional reasons can be posited to explain whythis should be so.

Here, several cases in which ligands bind withpart of the molecule in a strained or unfavourablearrangement have been identified. In somecases, this appears to promote catalysis, either byweakening a particular covalent bond, or byrelieving steric hindrance around a reactive centre.

In others, such as the syn conformations observedfor the nucleoside moieties, the energetic impedi-ment to adopting these orientations is not particu-larly high,24 and thus their occurrence may beexplained as allowing for slightly more favourableprotein–ligand interactions. Finally, the qualitativeobservation is made that the degree of variationbetween the conformations of a given ligand whenbound by unrelated proteins is significant. Thefollowing section explores this finding in moredetail.

Diversity in ligand conformation

Having observed the large conformational vari-ations between representative molecules of theclusters, the next phase of the analysis was toquantify that variation, and compare it against theextent to which molecules in the same cluster(i.e. bound to proteins belonging to the samehomologous superfamily) adopt different shapes.

Hierarchical clustering

Similarities between the shapes of all ligands ineach dataset were calculated by performing least-squares superposition of all atoms in the molecule.The RMSD values thus obtained were used topopulate a distance matrix. This matrix was thenanalysed by applying complete-linkage clustering.The relationships between the shapes of the ligandsin the dataset, and the evolutionary relationshipsbetween the sites in which they are bound, werethen compared by producing plots of the clusteringdendrograms, labelling each leaf node with theindex of the cluster to which the correspondingmolecule belonged. In this way, clusters in which allmembers share a similar conformation are visible asleaf labels which are grouped together on thedendrogram; conversely, clusters which containmembers with diverse conformations manifestthemselves as labels which are scattered across theleaves of the tree.

The clustering dendrograms are shown inFigures 5 – 7. An initial comparison of these treesshows that of the four ligand types, ATP appears toexhibit the greatest degree of conformationalvariation within clusters. While, for NAD andFAD, most large clusters tend to have a largeproportion of non-leaf nodes below the 1 Athreshold, indicating that their members are simi-larly shaped, the two largest ATP clusters seem tobe more distantly spread in conformational space,and hence appear more widely scattered along theleaves of the dendrogram (Figure 5). Cluster 1contains members which, although they all share acommon Rossmann-fold ATP-binding domain, arediverse in terms of the protein function. Cluster 2,on the other hand, is composed almost exclusivelyof kinases. Qualitatively, the degree of confor-mational variance appears similar within each ofthese clusters, suggesting that similarity in protein

Page 10: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

0123456

RMSD /A

1EMD/A/NAD.314 A1N8K/A/NAJ.377 A1PSD/B/NAD.450 A1LLD/A/NAD.1 A1PJ3/E/NAD.2601 A1BMD/A/NAD.334 A1O6Z/C/NAD.2003 A1HYH/A/NAD.330 A1J5P/C/NAD.300 A1T2D/E/NAD.316 A1KOL/A/NAD.1403 A9LDT/A/NAD.401 A1R37/A/NAD.403 A2NAD/A/NAD.394 A1DXY/D/NAD.336 A1MX3/E/NAD.1000 A1PJC/G/NAD.500 A1NPD/A/NAD.300 A1P9L/E/NAD.301 A1ARZ/B/NAD.274 A1D4F/E/NAD.502 A1EE9/C/NAD.10 A1O04/D/NAD.6504 A1UXT/A/NAD.1503 A1BPW/A/NAD.4 A1K6X/C/NAD.400 S1F0Y/C/NAD.750 S1GAD/O/NAD.336 S1DLI/C/NAD.403 S1NVM/B/NAD.3501 S1BXG/A/NAD.360 S1CH6/G/NAD.571 S1GR0/A/NAD.1000 S1I3L/C/NAD.400 S1DHR/B/NAD.241 S1GEG/A/NAD.101 S1KEW/C/NAD.1400 S1ORR/A/NAD.1200 S1GEE/D/NAD.2262 S1E6W/C/NAD.301 S1NFF/F/NAD.2300 S1FMC/A/NAD.256 S1BDB/A/NAD.300 S1EBF/C/NAD.2109 S1QRR/A/NAD.401 S1NHW/I/NAD.450 S1QSG/B/NAD.1304 S1ENY/B/NAD.500 S1D7O/E/NAD.501 S1FDV/A/NAD.361 S1FK8/C/NAD.800 S1B14/C/NAD.255 S1IY8/A/NAD.1268 A1H94/A/NAD.799 -1NFB/A/NAD.701 -1EVJ/A/NAD.500 A1QAX/G/NAD.1001 -1NUU/E/NAD.401 A1K4M/D/NAD.601 A1EJ2/L/NAD.1339 A1LW7/E/NAD.601 A1AD3/A/NAD.600 -1CH6/G/NAD.562 -1GRB/A/NAD.480 A2NPX/A/NAD.818 A1F3P/C/NAD.500 -1LSS/I/NAD.1001 A1MEW/E/NAD.987 -1F8G/E/NAD.2500 S1MI3/A/NAD.1350 A1M9H/B/NAD.300 A1HEX/A/NAD.400/A A1GZF/A/NAD.1248 S1OJZ/A/NAD.500 -1OG3/A/NAD.1227 -1GIQ/A/NAD.500 S1TOX/A/NAD.536 -1JQ5/I/NAD.401 A1DQS/A/NAD.400 A1LVL/A/NAD.460 -1RLZ/E/NAD.700 -1OWB/C/NAD.3001 A1S7G/F/NAD.701 A1IB0/B/NAD.1994 S2BKJ/B/NAD.243 A1HWY/G/NAD.31/A A1LW7/E/NAD.605 A

1111111111111111111111222111111111111111111111111111117115333321244417188165555176641118101991314

1.1.1.37 1.1.1.1 1.1.1.95 1.1.1.27 1.1.1.38 1.1.1.37 1.1.1.37

1.1.1.27 1.2.1.46 1.1.1.27 1.1.1.1 1.2.1.2

1.4.1.1

1.3.1.26 1.3.1.26 3.3.1.1 1.5.1.15 1.2.1.3 1.2.1.9 1.2.1.8

1.1.1.35 1.2.1.12 1.1.1.22

1.4.1.20

5.1.3.2 1.5.1.34 1.1.1.5 4.2.1.46

1.1.1.47 1.1.1.35

1.1.1.159

1.1.1.3 3.13.1.1

1.3.1.9

1.3.1.9 1.1.1.62

1.1.1.1

1.1.1.49 1.1.1.205

1.1.1.88 2.7.7.1 2.7.7.18 2.7.7.1

1.2.1.5

1.8.1.7 1.11.1.1

1.1.1.205 1.6.1.2

1.1.1.274 1.1.1.85

2.4.2.31

2.4.2.36 1.1.1.6 1.1.1.25 2.5.1.19 2.7.1.71 4.2.1.10 1.8.1.4 2.5.1.46 2.3.3.1

1.6.2.2

1.4.1.3

Figure 6. Comparison of NAD conformations withcluster assignments. The dendrogram represents theresults of complete-linkage clustering, applied to theglobal RMSD values between all ligands in the ATPdataset. Each molecule is labelled according to whetherits N-glycosidic bond is in a syn (S), anti (A) orintermediate (–) conformation. Coloured blocks andnumbers indicate the cluster to which each moleculebelongs. EC numbers associated with the protein bindingeach ligand molecule, where available, are shown along-side each leaf of the tree.

01234

RMSD /A

1L2T/E/ATP.13011Q12/A/ATP.3024AT1/D/ATP.1541E8X/A/ATP.20001OL6/A/ATP.13881JAG/A/ATP.13011H1W/A/ATP.13731GOL/_/ATP.5811KP8/G/ATP.11OBD/A/ATP.3071JI0/C/ATP.3021A6O/A/ATP.3401JWA/E/ATP.11MB9/C/ATP.7021TID/E/ATP.2001A49/A/ATP.5351A82/A/ATP.8021E2Q/A/ATP.3021KO5/A/ATP.3021II0/C/ATP.5911D9Z/_/ATP.7001QHX/E/ATP.5011JJV/B/ATP.3001G21/I/ATP.52921QHG/E/ATP.7001FMW/B/ATP.9991D4X/B/ATP.6761KAZ/_/ATP.4861E4G/T/ATP.5001W7A/A/ATP.18011E79/A/ATP.6001H8H/F/ATP.6001Q97/A/ATP.4851RDQ/A/ATP.600/B1QMZ/A/ATP.3811KJ8/C/ATP.52PHK/E/ATP.3811PK8/C/ATP.802/A1CSN/A/ATP.2991DV2/C/ATP.10001HP1/B/ATP.6063PGK/A/ATP.11J7K/B/ATP.20601A0I/_/ATP.11O9T/B/ATP.13971G5T/C/ATP.999/A1N5I/C/ATP.5431R0X/E/ATP.21KP3/F/ATP.3011F9A/G/ATP.7001QRS/C/ATP.9991GN8/B/ATP.6001MAU/C/ATP.4001J09/B/ATP.5011B8A/C/ATP.5001E24/B/ATP.5121KMN/A/ATP.4521H4Q/A/ATP.14781B76/C/ATP.15523R1R/C/ATP.11NSY/C/ATP.20001GZ4/D/ATP.6011GZ4/B/ATP.6011KVK/C/ATP.5351ESQ/D/ATP.3001LHR/C/ATP.4011MJH/C/ATP.20012GNK/D/ATP.2001DY3/A/ATP.2001HI1/A/ATP.6651NSF/A/ATP.8581AYL/A/ATP.5411DO0/G/ATP.900

1123132122102212631725111111111207715111222525251626112181113393394444424366148831121191271

3.6.3.19 2.1.3.2 2.7.1.153 2.7.1.37 2.7.1.113 2.7.1.37 2.7.1.37

6.3.2.6

6.3.3.4 2.7.1.37 2.7.1.40 6.3.3.3 2.7.4.9 2.7.1.12 3.6.3.16

2.7.1.24 1.18.6.1

3.6.3.14 3.6.3.14 2.7.1.37 2.7.1.37 2.7.1.37

2.7.1.38

6.3.4.14, 6.4.1.2 3.1.3.5, 3.6.1.45 2.7.2.3

6.5.1.1 2.5.1.6 2.5.1.17 2.7.4.9

6.3.4.5 2.7.7.1 6.1.1.18 2.7.7.3 6.1.1.2 6.1.1.17 6.1.1.12 6.1.1.6 6.1.1.21

6.1.1.14 1.17.4.1 6.3.1.5 1.1.1.38 1.1.1.38 2.7.1.36 2.7.1.50 2.7.1.35

2.7.6.3

3.6.4.6 4.1.1.49

Figure 5. Comparison of ATP conformations withcluster assignments. The dendrogram represents theresults of complete-linkage clustering, applied to theglobal RMSD values between all ligands in the ATPdataset. Coloured blocks and numbers indicate thecluster to which each molecule belongs. EC numbersassociated with the protein binding each ligand mole-cule, where available, are shown alongside each leaf ofthe tree.

Conformational Diversity of Ligands 937

function does not necessarily imply similarity inligand binding mode.

In the case of NAD (Figure 6), it is clear thatcluster 1 (sites composed principally of Rossmann-fold domains from CATH superfamily 3.40.50.720)contains two distinct conformational subgroupings.Inspection of the cN angles for each molecule in

cluster 1 shows that these groups correspond to thesyn and anti orientations of the nicotinamideN-glycosidic bond (see Figure 6). The other NADclusters all consist of just a few members; mostadopt similar conformations, with only occasionaloutliers (e.g. 1AD3 in cluster 2, 1LVL in cluster 4).

Page 11: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

0123456

RMSD /A

1F3P/C/FAD.4491PN0/C/FAD.60311EL5/A/FAD.4001AN9/B/FAD.3511QO8/A/FAD.6051QLA/H/FAD.61KNP/C/FAD.8001NG4/E/FAD.4001NHP/B/FAD.4483LAD/A/FAD.4801LVL/A/FAD.4591FCD/A/FAD.6993GRS/A/FAD.4791TRB/A/FAD.5001MO9/C/FAD.10131LQT/C/FAD.34571FEC/A/FAD.4991N1P/B/FAD.5101K0I/D/FAD.3951H82/A/FAD.5791HYU/C/FAD.7001GTE/A/FAD.10311GPE/A/FAD.6001CQX/A/FAD.4051QX4/A/FAD.3011P3Y/L/FAD.2591EFV/C/FAD.5991POX/A/FAD.6121V93/C/FAD.3001N62/F/FAD.49311FIQ/G/FAD.6061JRO/I/FAD.30051F0X/A/FAD.6001HSK/E/FAD.4012MBR/A/FAD.4011I19/A/FAD.7001E8G/B/FAD.6001H69/C/FAD.12741JA1/A/FAD.17501F20/B/FAD.15011I7P/B/FAD.3011DDI/_/FAD.6001JQI/E/FAD.3991IVH/A/FAD.3991K87/C/FAD.20011EP3/E/FAD.5021GAW/E/FAD.3201KRH/A/FAD.5011A8P/_/FAD.2591IQR/C/FAD.4211NP7/A/FAD.5001DNP/A/FAD.4721JR8/C/FAD.3341OQC/A/FAD.1

111411111111111411111412213148893335335511222277922226661010

1.5.3.1 1.4.3.3 1.3.99.1 1.3.99.1 1.4.3.16 1.4.3.19 1.11.1.1 1.8.1.4 1.8.1.4

1.8.1.7 1.8.1.9 1.8.1.5 1.18.1.2 1.8.1.12 1.1.3.6 1.14.13.2 1.5.3.11

1.3.1.2 1.1.3.4 1.14.12.17 1.6.2.2

1.2.3.3

1.2.99.2 1.1.3.22 1.1.1.204 1.1.1.28 1.1.1.158 1.1.1.158

1.1.3.38 1.6.99.2 1.6.2.4 1.14.13.39 1.6.2.2 1.8.1.2 1.3.99.2 1.3.99.10 1.5.1.12 1.5.99.8 1.3.3.1

1.18.1.3 1.18.1.2 4.1.99.3

4.1.99.3

Figure 7. Comparison of FAD conformations withcluster assignments. The dendrogram represents theresults of complete-linkage clustering, applied to theglobal RMSD values between all ligands in the ATPdataset. Coloured blocks and numbers indicate the clusterto which each molecule belongs. EC numbers associatedwith the protein binding each ligand molecule, whereavailable, are shown alongside each leaf of the tree.

938 Conformational Diversity of Ligands

Looking at the FAD dendrogram, we see that thereare several groups of clusters which seem to be closetogether in conformation space. For example,ligands in clusters 1, 4 and 12 adopt very similarconformations. These are all sites from glutathionereductase proteins, while clusters 3 and 5 are frommembers of the p-cresol methylhydroxylase family.The finding that FAD molecules binding to gluta-thione reductase proteins show quite conservedconformation is the reverse of that reported byDym & Eisenberg. In contrast to the conserved FAD-binding mode of these proteins, sites from cluster 2(flavin reductases) clearly show much more diver-sity. This is explored further below.

A final observation arising from the dendrogramsis the lack of apparent correlation between ligand/cofactor conformation and the function of theprotein, as represented by its EC number(s). Forexample, among the ATP molecules, sites fromproteins with EC number 3.6.3., proton-pumping

ATPases, sometimes occur very close together inconformation space (e.g. 1H8H and 1E79), but inother cases, bind ATP in very different forms (e.g.1I10 and 1Q12). EC numbers 2.7.1., phosphotrans-ferases using an alcohol as the acceptor) appearacross almost the full range of the tree, with littleobvious grouping beyond that which we mightexpect due to members of the same superfamilyshowing conservation in both ligand conformationand protein function. Viewed in a biologicalcontext, this result is not surprising, however:there is no clear reason why fulfilment of this roleshould require a particular ATP conformation.

An exception is found in the tRNA synthetases.Cluster 9 contains two families of class I tRNAsynthetases, which use a Rossmann fold to recog-nise the ATP molecule;17 cluster 3 also contains abinding site from a class I tRNA synthetase(1MAU), as well as several other proteins ofdifferent functions which recognise ATP using arelated domain. Cluster 4 contains five families ofclass II tRNA synthetases, whose ATP recognitionsite is built around an antiparallel b-sheet.17 Whilethe two classes of tRNA synthetase bind ATP insubtly distinct conformations, they still remainadjacent in the clustering dendrogram, indicatingthat although evolutionarily unrelated, these twogroups have converged upon a similar ATP-recognition mode.

Among the NAD-binding sites, the situation isslightly different. Stereospecificity in dehydro-genases has been studied extensively, and thetendency for enzymes which accept the samesubstrates to adopt the same cofactor stereospeci-ficity previously noted.25 Although the amount ofdata shown in Figure 6 is fairly small, we see thatwithin cluster 1, where there are multiple familieswith the same EC number, they tend to clusterclosely in conformational space with the N-glyco-sidic bond in the same orientation (e.g. 1D7O,1QSG, 1F0Y; 1R37, 1N8K). An exception to this isthe NAD-binding site from 1B14. Like 1R37 and1N8K, this is a structure of an alcohol dehydro-genase, but unlike these two, in entry 1B14, thesubstrate is not present, and the nicotinamide ringhas rotated around to sit in the substrate-bindingpocket.

For the FAD-binding sites (Figure 7), there is noclear pattern in the distribution of enzyme functionsrelative to cofactor conformation. Here, the mainfunctional requirement, namely that the electrondonor/acceptor group (flavin) be presented to thesubstrate, does not appear to impose any particularconformational constraints on the rest of themolecule.

Multidimensional scaling

MDS26 was applied to the matrix of RMSDvalues between all pairs of molecules, for eachtype of ligand studied; the result of the MDS willbe referred to here as the “projection space.” Plotsof the first two dimensions of this space are

Page 12: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

2

2

22

22

2

2

3

3

3

3

3

3

3

44

4

4

4

555

66

6

7

7

8

8

99

10

11

12

13

14

15

16

17

18

1920

21

22

23

24

25

26

27

First component of variance (36%)

Sec

ond

com

pone

nt o

f var

ianc

e (1

6%)

A

S1

1

1

1

11

11

11

1

1

1

11

1 1

11 11

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

11

1

1

1

1

22

2

2

3

3

3

3

44

4

4

5

55

5

66

7

7

8 8

9

10

11

12

13

14

15

16

17

18

19

First component of variance (20%)

Sec

ond

com

pone

nt o

f var

ianc

e (2

0%)

GR

PCMH

11

1

1

1

11 1

1

1

1

1

1

11

1 1

1

1

2

2

2

222

2

2

2

3 3

3

3

3

4

4

4

5

5

5

666

77

8

8

9

9

1010

11

12

13

14

First component of variance (31%)

Sec

ond

com

pone

nt o

f var

ianc

e (3

1%)

(a) ATP (b) NAD

(c) FAD

Figure 8. Multidimensional scaling of ligand conformational differences. The two largest components of variance areshown; values shown on the axes indicate the proportion of total variance which is explained by each component.Numbers at each data point refer to the cluster to which the molecule belongs. For NAD, the two distinct conformationalsubgroups visible in cluster 1 correspond to the anti (A) and syn (S) conformations of the N-glycosidic bond. For FAD,circles indicate conformational groupings which correspond to previously identified FAD-binding families: glutathione-reductase (GR) and p-cresol methylhydroxylase (PCMH).

Conformational Diversity of Ligands 939

shown, for ATP, NAD and FAD, in Figure 8. Theresults of the multidimensional scaling reinforcethe findings obtained by inspecting the clusteringdendrograms. The extent of variation withincertain clusters is perhaps better highlighted inthe MDS plots: see for example ATP cluster 3 andFAD cluster 2. In addition, they allow us toaddress the question of whether conformationalvariability is manifested continuously, or asdistinct subgroups. The answer seems to be thatthis varies in different clusters: the variation in

ATP cluster 1 shows a tendency to be divided intoa number of groupings, while the conformationsof ATP cluster 3 and FAD cluster 2 are fairlyevenly distributed.

Comparison of conformational diversity withinand between superfamilies

Next, we wished to test the hypothesis thathomologous proteins tend to bind their ligands in

Page 13: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

940 Conformational Diversity of Ligands

similar conformations. If this hypothesis were true,we would expect that the amount of conformationalvariability observed between members of the samecluster would be significantly less than thatbetween members of different clusters. In order totest this, we first calculated points (in projectionspace), which lie at the centroids of each cluster.These points are taken to represent the meanconformation of each cluster.

We take the mean RMSD between all moleculesbelonging to a given cluster as the statistic whichmeasures the conformational spread of that cluster.Since the distance between a pair of points in theMDS projection space represents the RMSDbetween the corresponding pair of molecules, wecan calculate this value by computing the meandistance between all pairs of points in a givencluster. Let this quantity be denoted m, and thenumber of members in the cluster by s. To test thehypothesis given above, we then randomly samples cluster centroid points and calculate the meanpairwise distance between them. Repeating thissampling 10,000-fold results in a distribution ofvalues; the fraction of this distribution which liesbelow m is the probability of observing this value bychance, if the hypothesis is true. The results areshown in Table 4.

Two clusters (NAD cluster 1 and FAD cluster 1)were excluded from this analysis because theycontained more members than there were clustercentroids, thus rendering the sampling schemedescribed above impossible. Inspection of theMDS plots shows that these two clusters are clearlyhighly localised in projection space, indicating that

Table 4. Comparison of conformational variance within and

Between clusters

LigandNo. of

clusters

No. ofcentroid

pairs

Meaninter-

centroidRMSD

(A) ClusterNo. of

members

ATP 27 351 2.21G0.03 1 222 93 74 55 36 3

NAD 19 171 3.58G0.08 1 542 43 44 45 4

FAD 14 91 3.49G0.13 1 192 93 54 35 36 3

This Table shows statistics on the conformational variance within andusing the mean pairwise distance between points in the projection sbetween cluster centroids (for between-cluster spread).

a Probability of observing a within-cluster conformational varianccluster sample ligand conformation space at random is true. This wa

b Insufficient number of cluster centroids available for sampling.

H0 would likely be strongly rejected for both, weresufficient data available.

Of the remaining clusters, the null hypothesis isrejected with p!0.01 for all but two cases: ATPcluster 3 (pZ0.07) and ATP cluster 6 (pZ0.23). Thelatter contains only three members, of which twoare highly similar, with one distant outlier; thisresult may not be particularly meaningful. There isno clear functional reason why cluster 3 should beso conformationally diverse: although its memberscatalyse a number of different functions, this is thecase also for other clusters. Although the spreadwithin FAD cluster 2 was not found to bestatistically significant, it is clearly the largestamong all the FAD clusters. As pointed out byDym & Eisenberg,9 this spread can be understoodby inspecting the structure of the binding siteswithin the cluster.

Analysis of torsion angles

The degree of conformational diversity illustratedin the previous sections invites two questions. First,from which parts of the molecules does thisdiversity originate? In other words, which torsionangles are most variable? Secondly, to what extentcan some outlying or unusual conformations bediscounted as being the result of poor-qualitystructures?

To address these questions, we plotted the torsionangles around eight rotable bonds in the ATPmolecule (Figure 9). This plot serves to reinforcethe notion that the conformational freedom retainedby ATP when bound to protein is extensive: for all

between clusters

Pairwise RMSD within cluster (A)

No. ofpairs Min Max Mean pa

231 0.42 3.71 1.91G0.04 !10K10

36 0.47 2.26 1.58G0.08 !10K10

21 1.01 2.45 1.90G0.08 6.69!10K2

10 0.77 1.39 1.12G0.06 !10K10

6 0.88 1.04 0.95G0.03 3.00!10K4

6 0.07 2.86 1.93G0.51 2.25!10K1

1431 0.24 4.13 1.83G0.02 Not calculatedb

6 0.39 2.45 1.47G0.41 !10K10

6 0.99 2.00 1.69G0.18 !10K10

6 0.63 3.88 2.38G0.64 6.00!10K3

6 0.77 2.51 1.82G0.27 !10K10

171 0.29 1.54 0.87G0.02 Not calculatedb

36 0.72 4.83 2.78G0.18 7.70!10K3

10 0.32 1.12 0.92G0.07 !10K10

3 0.61 1.23 0.97G0.18 3.00!10K3

3 0.95 1.70 1.28G0.22 6.60!10K3

3 0.24 0.47 0.35G0.06 !10K10

between clusters for each ligand type. The variance was estimatedpace (for the within-cluster spread) and mean pairwise distance

e this large or greater, if the hypothesis that sites within a givens calculated according to the method described in the text.

Page 14: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

C4

C2

C6

C2’

C3’

C5’

C4’

C1’

C5

C8

N3

N1

N6

N7

N9

O2’

O3’

O1G

O3G

O2G O3B

O1B

O2B

O3A

O1A

O2A

O5’

O4’

PG

PB

PA

1

23

45

6

7

8

1 : O4’ - C1’ - N9 - C4 2 : O5’ - C5’ - C4’ - C3’ 3 : PA - O5’ - C5’ - C4’

4 : O3A - PA - O5’ - C5’ 5 : PB - O3A - PA - O5’ 6 : O3B - PB - O3A - PA

7 : PG - O3B - PB - O3A 8 : O3G - PG - O3B - PB

< 1

.5

1.5

- 2

2 -

2.5

2.5

- 3

> 3

Figure 9. ATP torsion angles. For each rotable bond, a wheel plot shows the torsion angles for the 27 ATP clusterrepresentatives. The colours of the spokes represent the resolution of the structures from which the coordinates weretaken, according to the scale bar (values in A). Favourable angular regions for each bond are shaded.

Conformational Diversity of Ligands 941

eight bonds, all favourable regions are populated. Itis clear, however, that conformational variability isnot uniform across the molecule. The preference forthe glycosidic bond to adopt the anti orientation isclearly shown in wheel 1, while other bonds(notably 2, 4 and 6–8) have much more evenlydistributed torsion angles.

With respect to resolution, the colour-coding ofthe plot shows that, in many cases, angles which lieoutside favoured regions do indeed come fromlower resolution structures. However, this is notuniversally true: bonds in the triphosphate tail ofthe molecule in particular are observed to be insterically unfavourable conformations even in well-resolved structures. It should be noted that thissimple treatment does not consider local measuresof structure quality, for example temperaturefactors, and therefore some of these cases mayrepresent ligands which retain some mobilitywithin the crystal lattice.

Discussion

The main finding of the current study is thatligands which we have inspected adopt a wide rangeof conformations when bound by proteins. This hasbeen shown by visual inspection of molecularsuperpositions, by calculation of the molecularradii of gyration, and by analysis of matrices of theRMSD values calculated between the coordinates ofdifferent instances of each ligand. This diversity ofconformation encompasses not only the canonical,extended ligand arrangements but also several casesin which the molecule is bound in a folded

configuration. In addition, torsion angles in manybound ligands, even those from high-resolutionstructures, fall well outside preferred, low-energyranges. The latter observation warrants furtherstudy: are these outlying angles simply a result ofpoor data, or is the binding protein genuinely able tocompensate for energetically unfavourable ligandconformations through favourable intermolecularinteractions? A further corollary of this wide range inligand conformations is that the success of methodswhich attempt to predict the specificity of bindingpockets using geometric comparison with thecoordinates known sites and/or ligand moleculesmay be constrained by the currently limited amountof available structural data.

Comparison of ligand conformational diversitywithin groups of related binding sites with that seenamong molecules bound to unrelated proteinsshows that, contrary to expectations, conformationis not always conserved within homologous super-families. Two possible interpretations of this findingmay be proposed. The first is that the structure ofthese proteins is such that mutations may occur inthe ligand-binding regions, permitting incrementalchanges in the way they bind to the partnermolecule, without disrupting the overall fold orfunction. This hypothesis is in keeping with previouspropositions that the most commonly observedprotein folds in nature (often referred to as the“superfolds”27) are those which are intrinsicallymore stable, perhaps due to the fact that they fold viarelatively simple pathways. Although one of theclusters found here to exhibit high diversity in itsATP conformations (cluster 1, the P-loop hydrolasedomains) is not a superfold, the stability principle

Page 15: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

† http://www.pymol.org

942 Conformational Diversity of Ligands

may nonetheless apply in a restricted form to itsbinding site region.

An alternative rationale is that conformationalvariation, rather than being due to evolutionarychange, simply reflects the fact that certain bindingsites possess greater structural plasticity thanothers. We may hypothesise that, while in somecases, the ligand must be rigidly restrained in orderto ensure reliable recognition and/or to enable it totake part in chemical reactions once bound, otherproteins may not require the ligand to be so tightlyconstrained. In particular, it may be that someproteins merely recognise a part of the ligand whichcontains its reactive groups, allowing the rest of themolecule to retain a certain degree of mobility. Inthis situation, we would expect to see considerablevariation in the coordinates of the ligand as a whole,when bound to different proteins, but may find thatcertain fragments of the molecule adopt moreconserved local conformations. These two alter-natives, which in many cases may not be mutuallyexclusive, could be reasonably tested using model-ling experiments and molecular dynamics studies,respectively. Better understanding of the behaviourof ligand conformation as proteins evolve wouldcertainly be of relevance to rational drug designprogrammes, where prediction of the ways inwhich a therapeutic molecule may interact notonly with the target, but also with related proteins,is paramount.

This work raises numerous other questionswhich could be addressed in future studies. Onesuch area is the connection of cofactor conformationto enzymatic function. The results presented heresuggest that this is weak at best, but this conclusionis drawn from a very small data set, and shouldtherefore be more thoroughly validated. It may bethat more sensitive methods of ligand structurecomparison than the superposition/RMSDapproach taken here could uncover subtle relation-ships between conformation and function. Thedevelopment of a method which attempts to predictthe function of an enzyme based on the shape of itscofactor, using sophisticated spherical-harmonicsbased shape matching, is currently underway (R. J.Morris, personal communication), and may gosome way to addressing this point.

It should be pointed out that considerations ofconformational variability are just one aspect of thediversity inherent in molecular recognition. Pre-vious studies have shown that certain conservedinteraction patterns between proteins and ligandscan be identified.28 However, such patterns are onlypresent in a subset of the proteins which bind agiven ligand. Moreover, even among sites in whichconserved motifs can be identified, the rest of thebinding region can be considerably more variable.Work is currently underway in our laboratory todevelop methods for quantifying the extent ofbinding site diversity.

One reason why this study was limited to the fourmolecules studied, was that a limitation of theautomatic method for dataset generation used here

is that it does not guarantee that the binding sitesobtained are in complex with their cognate ligands.This is not too great a problem for the compoundsstudied here, where we can be reasonably confidentthat most of the interactions which they make arebiologically relevant. Extending the study to morediverse molecules, however, increases the risk thatanalogues, xenobiotic compounds and experimen-tal artefacts will pollute the results. A resourcewhich rectifies this problem by cataloguing thecognate ligands associated with each proteindomain should soon be available (I. Nobeli &M. Bashton, personal communication); applicationof the generic analytical methods presented here toa “clean” dataset containing greater diversityof ligands would likely yield many interestingfindings.

In conclusion, the work presented here hashighlighted the conformational diversity of fourcommon ligands, especially when bound to unre-lated proteins, but also to a lesser extent withinsuperfamilies. These results highlight some of thechallenges facing the development of reliablebinding site comparison methods.

Methods

All analyses reported here were performed using anewly developed object-oriented software library calledGAMUT, which provides a rich toolset for the develop-ment of applications in structural bioinformatics. Mole-cular graphics were prepared using PyMOL†.

Generation of ligand datasets

All ligands of a given type present in the PDB wereobtained by inspection of the three-letter residue identi-fiers. For example, the set of adenosine triphosphatemolecules was retrieved by searching for the code “ATP”.Each molecule thus obtained was then validated usingthe following procedure. The atomic coordinates werefirst converted into a graph: each atom is represented by avertex labelled with its atomic number. Edges are addedbetween vertices corresponding to atoms separated byless than the sum of their standard covalent radii(obtained from the CSD). Each vertex was labelled withthe atomic number of the corresponding atom. Theresulting graph thus describes the chemical structure ofthe ligand molecule, independent of its particularconformation. This graph was then matched against areference graph for the appropriate compound, obtainedfrom the Macromolecular Structure Database ligandchemistry component (MSDchem), using a subgraphisomorphism algorithm.29 Molecules which failed tomatch perfectly were discarded; the most common reasonfor failure to match was missing atoms due to lowstructure quality (e.g. the NAD molecule in entry 1BI9).A further advantage of performing this validation is that amapping is obtained between the atoms of the ligandmolecule and those of the reference compound; incon-sistencies in atom names are thus resolved, facilitatingrobust automatic superposition.

Page 16: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

Conformational Diversity of Ligands 943

The environment of each ligand molecule is thenchecked to ensure that only those which are non-covalently bound to protein are retained. This is doneby checking whether the ligand is covalently bonded toany other molecule; those which are (e.g. the ATPmolecule in entry 1PVK, which is covalently linked totwo protein residues, or the NAD cofactor in entry 1DFH,which is covalently bonded to the substrate molecule).

Ligands which pass both of these tests are thenclustered on the basis of the relatedness of the proteinswhich bind them. More specifically, the grouping is madeaccording to the protein “domains” which contact theligand, since in large, multidomain proteins, it is often thecase that only a subset of the protein domains areinvolved in recognising the ligand. We wish to obtaintwo levels of clustering: in the first, grouping togethersites whose domains can be inferred to be related on thebasis of sequence similarity, and then grouping siteswhere the constitutive domains, although dissimilar insequence, belong to the same homologous superfamilyaccording to the CATH database.30 For single-domainsites, this is trivial, however when more than one domaincontacts the ligand, some type of heuristic is required. Theapproach used here is to compute a score which expressesthe dissimilarity of two binding sites in terms of thedomains which constitute them. For the first round ofclustering, this is computed as follows:

(1) For each site to be clustered, identify the set ofresidues which contact the ligand, defined using asimple distance threshold: any residue which has atleast one atom within 4 A of any atom of the ligand isconsidered to be part of the binding site.

(2) Annotate all amino acid residues in the binding siteaccording to the domain to which they belong,according version 2.6.1 of CATH. Residues whichare not annotated as belonging to any domain aremarked as unclassified.

(3) For each site, obtain the CATH S35 family to which eachof its domains belongs. Thus, compute the fraction ofresidues contributed to the site by each sequence family.For instance, imagine a site X composed of threedomains D1, D2 and D3, which contribute 5, 3 and 2residues, respectively to the site. If D1 belongs to CATHS35 family S1, D2 to S2, and if D3 is unclassified, thedomain composition of the site at the S35 level can bewritten as {(S1,0.5), (S2,0.3)}, 0.2, where the 0.2represents the proportion of the site’s residues whichwere not classified in CATH. Generally:

X Z fðx1; p1Þ;.; ðxm; pmÞg; u :Xm

iZ1

ðpiÞCu Z 1

(4) Given the composition of two sites X and Yexpressed inthis way, their dissimilarity is computed as follows:starting with 1, for each family which contributes toboth sites, subtract from the current dissimilarity value,the smaller of the two corresponding fractionalcontribution values. Algorithmically:

dðX;YÞ Z 1KXm

iZ1

Xn

jZ1

sij

where sij Zminðpi; qjÞ if xi Z yj

0 otherwise

(

The resulting dissimilarity scores, which are bounded by[0,1], are used to populate a distance matrix to which

complete linkage clustering is applied. Clusters areobtained by cutting the clustering dendrogram at a valueof 0.4, which was chosen after experimenting with a rangeof values, as giving the most reasonable results. From eachcluster, a representative is chosen according to two criteria:firstly, structures which are thought to represent the truebiological unit, as determined by PQS, are preferred. Sitesare then ordered according to the resolution of the structure.

This procedure is then repeated to cluster therepresentatives of the first stage, this time determiningthe composition of each site in terms of the CATH H levelrather than the S35 level; a threshold value of 0.4 is againused to obtain discrete clusters. The result is a set ofclusters whose members represent sites which arecomposed of domains which are homologous, butwhich share less than 35% sequence identity.

Exhaustive generation of ligand conformations

In order to determine the extent of conformationalspace explored by bound ligands, an enumeration of allpossible conformations of a given ligand was required.Such a set of coordinates was generated by writing aprogram which exhaustively samples the allowed torsionangles at each rotable bond, then checks each confor-mation to determine whether it is physically feasible, i.e.is free of internal steric clashes. For each bond, the medianvalue of each preferred region was taken as a rotamer.Steric clashes were tested for by checking whether anypair of non-bonded atoms approaches closer than the sumof their van der Waals radii; values were taken from thosecalculated by Bondi.31 Atoms in the same ring system areexcluded from this check. Since the restrictive definitionof rotable bonds used here does not include bonds such asC–OH, it is possible that atoms which were found to clashin the conformations generated here would actually befree to rotate away from one another, were a morecomplete simulation of the molecule to be carried out. Forthis reason, clashes due to this type of atom (such as ahydroxyl hydrogen) were ignored.

Acknowledgements

G.R.S. was funded by a BBSRC studentship and aCASE award from Inpharmatica, and acknow-ledges additional supervision from Dr RichardJackson. The authors thank Dr Johnathan Barker,Dr Tim Massingham and Dr David Westhead fortheir advice regarding the statistical analysis.

References

1. Nicklaus, M. C., Wang, S., Driscoll, J. S. & Milne, G. W.(1995). Conformational changes of small moleculesbinding to proteins. Bioorg. Med. Chem. 3, 411–428.

2. Bostrom, J., Norrby, P. O. & Liljefors, T. (1998).Conformational energy penalties of protein-boundligands. J. Comp. Aided Mol. Des. 12, 383–396.

3. Weber, P. C., Pantoliano, M. W. & Salemme, F. R.(1995). Crystallographic and thermodynamic com-parison of structurally diverse molecules binding tostreptavidin. Acta Crystallog. sect. D, 51, 590–596.

Page 17: Conformational Diversity of Ligands Bound to Proteinsdna.bio.puc.cl/cardex/papersbio252/Grupo08-2013.pdf · Conformational Diversity of Ligands Bound to Proteins Gareth R. Stockwell1,2

944 Conformational Diversity of Ligands

4. Moodie, S. L. & Thornton, J. M. (1993). A study intothe effects of protein binding on nucleotide confor-mation. Nucl. Acids Res. 21, 1369–1380.

5. Allen, F. H. (2002). The Cambridge StructuralDatabase: a quarter of a million crystal structuresand rising. Acta Crystallog. sect. B, 58, 380–388.

6. Saenger, W. (1984). Principles of Nucleic Acid Structure,Springer, New York.

7. Perola, E. & Charifson, P. S. (2004). Conformationalanalysis of drug-like molecules bound to proteins:an extensive study of ligand reorganization uponbinding. J. Med. Chem. 47, 2499–2510.

8. Carugo, O. & Argos, P. (1997). NADP-dependentenzymes. I: conserved stereochemistry of cofactorbinding. Proteins: Struct. Funct. Genet. 28, 10–28.

9. Dym, O. & Eisenberg, D. (2001). Sequence-structureanalysis of FAD-containing proteins. Protein Sci. 10,1712–1728.

10. Kho, R., Baker, B. L., Newman, J. V., Jack, R. M., Sem,D. S., Villar, H. O. & Hansen, M. R. (2003). A path fromprimary protein sequence to ligand recognition.Proteins: Struct. Funct. Genet. 50, 589–599.

11. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter,M. C., Estreicher, A., Gasteiger, E. et al. (2003). TheSWISS-PROT protein knowledgebase and its sup-plement TrEMBL in 2003. Nucl. Acids Res. 31, 365–370.

12. Hammes, G. G. (2002). Multiple conformationalchanges in enzyme catalysis. Biochemistry, 41,8221–8228.

13. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. &Watson, J. D. (1994). Molecular Biology of the Cell (3rdedit.), Garland, New York.

14. Weber, S. (2005). Light-driven enzymatic catalysis ofDNA repair: a review of recent biophysical studies onphotolyase. Biochim. Biophys. Acta, 1707, 1–23.

15. Henrick, K. & Thornton, J. (1998). PQS: a proteinquaternary structure file server. Trends Biochem. Sci.23, 358–361.

16. Eriani, G., Delarue, M., Poch, O., Gangloff, J. &Moras, D. (1990). Partition of tRNA synthetases intotwo classes based on mutually exclusive sets ofsequence motifs. Nature, 347, 203–206.

17. Schmitt, E., Moulinier, L., Fujiwara, S., Imanaka, T.,Thierry, J. C. & Moras, D. (1998). Crystal structure ofaspartyl-tRNA synthetase from Pyrococcus kodakar-aensis KOD: archaeon specificity and catalytic mech-anism of adenylate formation. EMBO J. 17, 5227–5237.

18. Eriksson, M., Uhlin, U., Ramaswamy, S., Ekberg, M.,Regnstrom, K., Sjoberg, B. M. & Eklund, H. (1997).Binding of allosteric effectors to ribonucleotidereductase protein R1: reduction of active-site cysteinespromotes substrate binding. Structure, 5, 1077–1092.

19. van den Heuvel, R. H., Westphal, A. H., Heck, A. J.,Walsh, M. A., Rovida, S., van Berkel, W. J. &Mattevi, A. (2004). Structural studies on flavinreductase PheA2 reveal binding of NAD in anunusual folded conformation and support novelmechanism of action. J. Biol. Chem. 279, 12860–12867.

20. Singh, S. K., Kurnasov, O. V., Chen, B., Robinson, H.,Grishin, N. V., Osterman, A. L. & Zhang, H. (2002).Crystal structure of Haemophilus influenzae NadR,protein. A bifunctional enzyme endowed withNMN, adenyltransferase and ribosylnicotinimidekinase activities. J. Biol. Chem. 277, 33291–33299.

21. Bell, C. E., Yeates, T. O. & Eisenberg, D. (1997).Unusual conformation of nicotinamide adeninedinucleotide (NAD) bound to diphtheria toxin: acomparison with NAD bound to the oxidoreductaseenzymes. Protein Sci. 6, 2084–2096.

22. Royston, P. (1982). Algorithm AS 181: the W test fornormality. Appl. Stat. 31, 176–180.

23. Ihaka, R. & Gentleman, R. (1996). R: a language fordata analysis and graphics. J. Comput. Graph. Stat. 5,299–314.

24. Lau, S. K., Chass, G. A., Lovas, S., Penke, B. &Csizmadia, I. G. (2003). An exploratory ab initioconformational analysis of selected fragments ofnicotinamide adenine dinucleotide (NADC). Part II:adenosine. J. Mol. Struct. (Theochem), 666, 431–437.

25. Glasfeld, A., Leanz, G. F. & Benner, S. A. (1990). Thestereospecificities of seven dehydrogenases fromAcholeplasma laidlawii. The simplest historical modelthat explains dehydrogenase stereospecificity. J. Biol.Chem. 265, 11692–11699.

26. Kruskal, J. B. & Wish, M. (1977). MultidimensionalScaling, Sage Publications, Beverly Hills, CA.

27. Orengo, C. A., Jones, D. T. & Thornton, J. M. (1994).Protein superfamilies and domain superfolds. Nature,372, 631–634.

28. Denessiouk, K., Rantanen, V. & Johnson, M. (2001).Adenine recognition: a motif present in ATP-, CoA-,NAD-, NADP-, and FAD-dependent proteins. Pro-teins: Struct. Funct. Genet. 44, 282–291.

29. Krissinel, E. B. & Henrick, K. (2004). Commonsubgraph isomorphism detection by backtrackingsearch. Softw. Pract. Exper. 34, 591–607.

30. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T.,Swindells, M. B. & Thornton, J. M. (1997). CATH—ahierarchic classification of protein domain structures.Structure, 5, 1093–1108.

31. Bondi, J. (1964). Van der Waals volumes and radii.J. Phys. Chem. 68, 441–451.

Edited by Michael J. E. Sternberg

(Received 11 August 2005; received in revised form 27 November 2005; accepted 2 December 2005)Available online 20 December 2005