11
J. Mol. Biol. (1995) 245, 43-53 Molecular Recognition of Receptor Sites using a Genetic Algorithm with a Description of Desolvation Gareth Jones!, Peter Willett’ and Robert C. Glen** 1Dep7rtnwr~t oflt@m7tior7 Studies rind Krebs lnsfittife for Biomolecular Resenrclz University of Sheffield ‘Western Bank, Sheffield SlO 2TN, U.K. ‘Deprtment of Plzysicnl Sciences,WellcomeResenrh Laboratories, Beckenham, Kent BR3 385, U.K. *Currcsyondirrg niffhor Understanding the principles whereby macromolecular biological receptors can recognise small molecule substrates or inhibitors is the subject of a major effort. This is of paramount importance in rational drug design where the receptor structure is known (the “docking” problem). Current theoretical approaches utilise models of the steric and electrostatic interaction of bound ligands and recently conformational flexibility has been incorporated. We report results based on software using a genetic algorithm that uses an evolutionary strategy in exploring the full conformational flexibility of the ligand with partial flexibility of the protein, and which satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Results are reported on five test systems showing excellent agreement with experimental data. The design of the algorithm offers insight into the molecular recognition mechanism. E(etJruor&“‘s: protein ligand docking; genetic algorithm; desolvation Introduction Biological receptors exhibit highly selective recognition of small organic molecules. Evolution has equipped them with a complex three-dimen- sional “lock” into which only specific “keys” will fit. This has been exploited by medicinal chemists in the design of molecules to selectively augment or retard biochemical pathways and so exhibit a clinical effect. X-ray crystallography has revealed the structure of a significant number of these receptors or active sites. It would be advantageous in attempting the computer-aided design of therapeutic molecules to be able to predict and explain the binding mode of novel chemical entities (the “docking” problem) when the active site geometry is known. The process of automated ligand docking has been the subject of a major effort in the field of computer-aided molecular design (Blaney & Dixon, 1993). Early examples, for example the DOCK program (Kuntz et al., 1982), consider only orientational degrees of freedom, treating both ligand and receptor as rigid (DesJarlais et al., 1988; Shoichet et al., 1993; Lawrence & Davis, 1992). Abbreviations used: GA, genetic algorithm; NMR, nuclear magnetic resonance; DHFR, dihydrofolate reductase; NADPH, reduced nicotinamide adenine dinucleotide phosphate;r.m.s., root-mean square; 4g-neu5Ac2en,4-guanidino-2-deoxy-2,3-di-dehydro-o-N- acetylneuraminic acid. 0022-2836/95/010043-11 $08.00/O More recently algorithms that take into account the conformational degrees of freedom of the ligand have been reported. These include docking fragments of the ligand and recombining the molecule (DesJarlais et al., 1986), systematic search (Kuntz, 1992) and clique detection (Smellie et al., 1991). These algorithms are all inherently combina- torial. Another approach is simulated annealing (Goodsell & Olson, 1990). Unfortunately the algorithm is very time-consuming, even given the relatively simple molecules docked. Dixon (1993) has given a brief report on the use of a genetic algorithm (GA) for docking, however, experimental results are as yet unpublished. Leach (1994) has developed an algorithm that takes into account conformational flexibility of protein side-chains as well as ligand flexibility; These results highlight the need for appropriate methods to accurately estimate the strength of ligandsbinding. A solution to the docking problem requires a powerful algorithm to search conformational space and an understanding of the processesof molecular recognition, so that reliable predictions of binding modes can be achieved. Inspection of the X-ray crystallographic structures of proteins with associated higli affinity Iigands reveals that they appear to conform closely to the shape of the receptor cavity and to interact at a number of hydrogen-bonding sites. The docking process probably involves the exploration of an ensemble of binding modes with the most energetically stable being observed. These exhibit low temperature factors and little crystallographic 0 1995 Academic Press Limited

Molecular Recognition of Receptor Sites using a.pdf

Embed Size (px)

Citation preview

Page 1: Molecular Recognition of Receptor Sites using a.pdf

J. Mol. Biol. (1995) 245, 43-53

Molecular Recognition of Receptor Sites using a Genetic Algorithm with a Description of Desolvation

Gareth Jones!, Peter Willett’ and Robert C. Glen**

1Dep7rtnwr~t of lt@m7tior7 Studies rind Krebs lnsfittife for Biomolecular Resenrclz University of Sheffield ‘Western Bank, Sheffield SlO 2TN, U.K.

‘Deprtment of Plzysicnl Sciences, Wellcome Resenrh Laboratories, Beckenham, Kent BR3 385, U.K.

*Currcsyondirrg niffhor

Understanding the principles whereby macromolecular biological receptors can recognise small molecule substrates or inhibitors is the subject of a major effort. This is of paramount importance in rational drug design where the receptor structure is known (the “docking” problem). Current theoretical approaches utilise models of the steric and electrostatic interaction of bound ligands and recently conformational flexibility has been incorporated. We report results based on software using a genetic algorithm that uses an evolutionary strategy in exploring the full conformational flexibility of the ligand with partial flexibility of the protein, and which satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Results are reported on five test systems showing excellent agreement with experimental data. The design of the algorithm offers insight into the molecular recognition mechanism.

E(etJruor&“‘s: protein ligand docking; genetic algorithm; desolvation

Introduction

Biological receptors exhibit highly selective recognition of small organic molecules. Evolution has equipped them with a complex three-dimen- sional “lock” into which only specific “keys” will fit. This has been exploited by medicinal chemists in the design of molecules to selectively augment or retard biochemical pathways and so exhibit a clinical effect. X-ray crystallography has revealed the structure of a significant number of these receptors or active sites. It would be advantageous in attempting the computer-aided design of therapeutic molecules to be able to predict and explain the binding mode of novel chemical entities (the “docking” problem) when the active site geometry is known.

The process of automated ligand docking has been the subject of a major effort in the field of computer-aided molecular design (Blaney & Dixon, 1993). Early examples, for example the DOCK program (Kuntz et al., 1982), consider only orientational degrees of freedom, treating both ligand and receptor as rigid (DesJarlais et al., 1988; Shoichet et al., 1993; Lawrence & Davis, 1992).

Abbreviations used: GA, genetic algorithm; NMR, nuclear magnetic resonance; DHFR, dihydrofolate reductase; NADPH, reduced nicotinamide adenine dinucleotide phosphate; r.m.s., root-mean square; 4g-neu5Ac2en,4-guanidino-2-deoxy-2,3-di-dehydro-o-N- acetylneuraminic acid.

0022-2836/95/010043-11 $08.00/O

More recently algorithms that take into account the conformational degrees of freedom of the ligand have been reported. These include docking fragments of the ligand and recombining the molecule (DesJarlais et al., 1986), systematic search (Kuntz, 1992) and clique detection (Smellie et al., 1991). These algorithms are all inherently combina- torial. Another approach is simulated annealing (Goodsell & Olson, 1990). Unfortunately the algorithm is very time-consuming, even given the relatively simple molecules docked. Dixon (1993) has given a brief report on the use of a genetic algorithm (GA) for docking, however, experimental results are as yet unpublished. Leach (1994) has developed an algorithm that takes into account conformational flexibility of protein side-chains as well as ligand flexibility; These results highlight the need for appropriate methods to accurately estimate the strength of ligandsbinding.

A solution to the docking problem requires a powerful algorithm to search conformational space and an understanding of the processes of molecular recognition, so that reliable predictions of binding modes can be achieved.

Inspection of the X-ray crystallographic structures of proteins with associated higli affinity Iigands reveals that they appear to conform closely to the shape of the receptor cavity and to interact at a number of hydrogen-bonding sites. The docking process probably involves the exploration of an ensemble of binding modes with the most energetically stable being observed. These exhibit low temperature factors and little crystallographic

0 1995 Academic Press Limited

Page 2: Molecular Recognition of Receptor Sites using a.pdf

44 Molecular Recognition of Receptor Sites

disorder,The key requirements for high affinity may be the complementarity of the three-dimensional shape of the ligand and protein, and the ability of the ligand to displace water and form hydrogen bonds al- key hydrogen-bonding sites when associated with the active site. Thus sufficiently accurate simulation of these interactions may be enough to explain and predict the binding mode of the majority of high-affinity ligands. /

Account must be taken of the ligand and protein conformational variability Here, we have adopted a genetic algorithm (Davis, 1991; Goldberg, 1989; Holland, 1975) that uses an evolutionary strategy in exploring the full conformational flexibility of the ligand with partial flexibility of the protein. GAS are a class of computational procedures that enable the rapid identification of good, but not necessarily optimal, solutions to combinatorial optimisation problems. There have been many reports of the use of GAS for a range of combinatorial matching problems in chemistry and biology (Brown et ai., 1994; Fountain, 1992). In particular, GAS have shown success in the conformational analysis of both small molecules (Clark et nl., 1994; Payne & Glen, 1993; Judson ef nl., 1993) and of macromolecules (Blommers et nl., 1992; Dandekar & Argos, 1993).

Tlie GA to be described here performs automated docking with full acyclic ligand flexibility and partial protein flexibility in the region of the receptor or active-site. To enable the GA to explore the possible hydrogen-bonding motifs we have attempted to quantify the ability of common substructures of drug molecules to displace water from the receptor (protein) surface. This, in conjunction with a steric .term, allows the GA to find the most energetically favourable combination of interactions. The algor- ithm has been tested on a number of ligand/protein docking problems and five examples are reported here.

Results and Discussion

The GA described in Materials and Methods requires as input the location and the size of the receptor site and the ligand that is to be docked into this site. The output is the ligand and protein conformations associated with the fittest chromo- some in the population when the termination conditions of the algorithm were satisfied. To ensure that most of the high-affinity binding modes were explored the GA was run 50 times for each of the examples examined (probably less runs are required as the top 20 results in most cases were very similar). The result with the best fitness score was used as the solution (although other results, with poorer fitness scores, may have better fits to the crystal structures). On average, each run of the GA took approximately five to eight minutes of CPU time on a Silicon Graphics R4000 Iris computer. Due to the inherently parallel nature of the algorithm and the .recent progress in CPU technology, the computing time may be expected to be substantially reduced on more recent processors.

Figure 1. Docking of methotrexate into ciihycirofolate reciuctase.

The GA docking procedure has been applied to a diverse set of problems, illustrated by Figures 1 to 5. In these Figures, ligand crystal structures (where available) and important protein close contacts have been coloured by atom type. For reasons of clarity, not all close contacts with the protein are displayed. The GA solutions are shown in red.

Methotrexate and dihydrofolate reductase

Dihydrofolate reductase (DHFR) is an enzyme of vital importance to DNA synthesis and preferential inhibition of the enzyme has been exploited to produce a range of potent anti-bacterial drugs and anti-cancer drugs. The geometry of binding of the anti-cancer agent methotrexate with Escl~ickin coli DHFR has been elucidated by X-ray crystallography (Bolin et nl., 1982). This is the binary complex (no NADPH).

Bound methotrexate is protonated at N-l and the binding mode of the pteridine ring results in a number of close contacts: pteridine N-l and 2-amino . . . Asp27 carboxylate, 2-amino group . . . water, 4-amino group . . . Ile5 and he94 carbonyl groups, benzoylcarbonyl . . . Arg52 guani- dinium (and water), L-glutamate cr-carboxy- late . . . Arg57, L-glutamate y-carboxylate . . . Arg52 and various water molecules. Not all of these are shown, for reasons of clarity

The GA solution (red) is shown overlaid upon the experimental result (coloured by atom type) in Figure 1.

Key hydrogen-bond interactions are shown in yellow. The similarity between the predicted and experimental results is excellent, with a root-mean- square (r.m.s.) deviation for the atoms of 0.99 A. The pteridine ring is correctly oriented with the hydrogen-bond (pteridine N-l and 2- amino . . . Asp27) the main feature. The benzoylcar- bony1 group is correctly oriented, interacting with Arg52. The L-glutamate y-carboxylate group is rotated slightly from the X-ray solution (where it is solvated), but the GA twists the side-chain to interact with Arg52. The L-glutamate a-carboxylate group is correctly positioned. The absence of bound water

Page 3: Molecular Recognition of Receptor Sites using a.pdf

Molecular Recognition of Receptor Sites 45

Figure 3. Docking of o-galactose into an L-arabinose binding protein mutant.

from the GA model seems not to affect the ability of the GA to correctly predict the binding mode. Perhaps bound water is a consequence of binding rather than a contributory factor to the stabilisation of the bound ligand.

Folate and dihydrofolate reductase

Vertebrate DHFR catalyses the NADPH-linked reduction of folate to 7&dihydrofolate and on to 5,6,7&tetrahydrofolate. The structure of the human recombinant enzyme has been determined by X-ray diffraction (Davis et al., 1990). At physiological pH, folate appears to bind as the neutral species and displays a number of close contacts in the active site: pteridine ring 2-amino . . . Glu30, pteridine ring 2-amino . . . water . . . Thr136, pteridine ring N-3 . . . Glu30, the pteridine ring carbonyl O-4 . . . water . . . Glu30, the carbonyl group of the y-aminobenzoic acid . . . Asn64, the folate a-car- boxylate group of the glutamate . . . Arg70, the folate y-carboxylate group of the glutamate. . . water molecules (not all shown, for reasons of clarity).

The GA solution is shown in red with the X-ray data coloured by atom type.

The GA correctly predicts the binding mode of the pteridine ring with the pteridine ring distances 2-amino. . . Glu30 and the pteridine ring N-3

Glu30 similar to the X-ray result, with an r.m.s. heiiation for the atoms in the pteridine ring of 0.63 A. The benzene ring is rotated relative to the crystal structure due the interaction of the folate glutamate y-carboxylate group with Arg70. Solutions having the alternative interaction (the folate cr-carboxylate group of the glutamate . . . Arg70) were found by the GA, however, the illustrated solution had the best GA score with an r.m.s. deviation for the atoms in the complete structure of 2.48 A. The X-ray result has one carboxylic acid molecule in solvent while the GA solution has both interacting with the protein.

Folate binding to DHFR has been studied by NMR spectroscopy These studies have suggested that three forms exist in equilibrium (Cheung ef nl., 1993). The predominant form at low pH (form 1: enol, N-l protonated) appears to bind in a similar manner to methotrexate while the neutral form (form-2: keto

form with N-l unprotonated) binds with the pteridine ring rotated 180”. Also, a third protonated form (form 3: enol N-l protonated) coexists with form 2 (the protonation state of folate in the protein in forms 2 and 3 is relatively insensitive to buffer pH) and this also binds in a similar mode to methotrexate. The GA solution for form 2 has previously been described. Form 3 is predicted by the GA to bind with the pteridine ring rotated by 180”; this prediction is shown in yellow in the Figure. This is consistent with the NMR results.

In the X-ray experiment (crystallisation performed at pH 5.91, difference maps were used to determine the folate position. However, extra electron density near the pteridine and benzene rings could not be accounted for. It may be speculated that the X-ray experiment observes forms 2 and 3 in equilibrium and the two GA binding results in combination would then account for the extra density

Galactose and L-arabinose binding protein

The structure of the carbohydrate-binding protein, L-arabinose binding protein mutant (Metl08Leu) has been determined by X-ray crystallography (Vermer- sch et al., 1991). This is a mutant of a periplasmic transport receptor of E. coli with high affinity for the sugar L-arabinose. The ligand (D-galactose) position was determined from difference Fourier analysis. There are many close contacts with the protein: o-1 . . . Asp90, O-2 . . . water, o-3 . . . Glu14, o-4 . . . Asn232, LyslO . . . O-2, Asn232 . . . O-3, ArglSl . . . O-4,

Asn295 . . . O-3, Argl51 . . . O-5,

water . . . O-5 (not all of these are shown, for reasons of clarity).

The GA solution (Figure 3) is almost identical with the experimental result, with the r,m.s. deviation of the atoms being as low as 0.67 A with all of the experimentally observed close contacts being cor- rectly reproduced.

Again, the GA predicts the correct binding mode despite the absence of bound water in the model.

Ring flexibility of the ligand was not included in the GA run. This makes the problem somewhat easier, but such flexibility could be included, if required, by the incorporation of a corner-flapping

Page 4: Molecular Recognition of Receptor Sites using a.pdf

46 Molecular Recognition of Receptor Sites

Figure 4. Docking of 4-guanidino-2-deoxy-2,3-di-dehy- dro-D-N-acetylneurantic acid into influenza sialidase.

Figure 5. Docking of trimethoprim into DHFR (EC 1.5.1.3.) complex with NADPH.

algorithm (Payne &‘Glen, 1993). The conformation of the l&and used in the GA is consistent with the known stable conformations of sugar molecules in solution. The alternative epimer (L-galactose, results not shown in diagram for clarity) was also docked by the GA. The solution that was obtained was again very close to the experimental result, with an r.m.s. deviation of 0.66 A.

4-Guanidino-2-deoxy-2,3-di-dehydro-D-N- acetylneuraminic acid and influenza sialidase

Potent inhibitors of influenza virus replication have recently been reported based upon inhibition of influenza sialidase. The binding mode of the inhibitor 4-guanidino-2-deoxy-2,3-di-dehydro-D-N- acetylneuraminic acid (4g-neu5Ac2en) to sialidase was reported based upon X-ray crystallography analysis of the protein with bound ligand (von Itzstein et nl., 1993). The ligand binds with a number of reported close contacts: terminal guanidinyl nitrogen atoms . . . Glu227 (2.7 A), terminal guanidinyl nitrogen atoms. . . Glu119 (3.6 and 4.2 A), carboxylate . . . Arg371.

The GA solution (red) is overlaid on the crystal structure of sialidase (coloured by atom type) in Figure 4.

This solution has been compared with the reported intermolecular distances, difference map and diagrams (von Itzstein et nl., 1993). It appears from these that the GA correctly predicts the binding mode, closely mirroring the experimental result. The GA solution is shown in red with close intermolecu- lar contacts in yellow. In addition the GA identifies a strong contact with Aspl51.

Trimethoprim and dihydrofolate reductase

Complexes of DHFR and trimethoprim (5-(3,4,5- trimethoxyphenyl)methyl - 2,4 - pyrimidinediamine) have been solved for a number of different DHFR species. Trimethoprim has been seen to bind in at least two different forms, the E. coli mode (form 1; Matthews et al., 1985) or the avian mode (form 2; Matthews et al., 1985). The diaminopyrimidine stays

largely in the same position in both forms, however, the trimethoxybenzene has rotated due to altered torsional angles about the methylene ring-linking group.

The nature of the active site is such that the E. coli site is more restricted than the mammalian sites. This is reflected in the single binding mode observed for trimethoprim/E. coli DHFR while different modes of binding of trimethoprim are observed in mammalian DHFR types.

In mouse L1210 (Stammers, 1987) it is bound as form 2. With mouse S180 after co-crysfdlisnfion of trimethoprim with the DHFR, it binds as form 1 (Groome, 1991). With mouse S180 after diffikorz of trimethoprim into the DHFR, it binds in form 2 (A. J. Geddes, personal communication).

We have applied the GA to both the E. co/i and mammalian forms.

The ternary complex of DHFR (EC 1.5.1.3), NADPH and trimethoprim (Champness ef nl., 1986) was partially crystallographically refined (J. N. Champness, personal communication). The trimethoprim molecule interacts via the diaminopy- rimidine with Asp27 forming a pair of hydrogen bonds. The implication is that N-l of trimethoprim is protonated. One methoxy group is positioned in van der Waals contact with the side-chains of Leu28 and Phe31.

The genetic algorithm solution (red) is overlaid on the crystal structure (Figure 5). It is close to the crystallograpl$ally determined structure (r.m.s. deviation 1.1 A) having an almost identical confor- mation and being slightly displaced. Of particular note is the positioning of the trimethoxybenzene moiety very close to the crystallographically determined structure.

The ternary complex of mouse L1210 DHFR/ NADPH and trimethoprim (Stammers et al., 1987) was used as a representative of a marnmalian DHFR. In this crystal structure, the trimethoprim is bound in a conformation consistent with form 2.

As described, the GA was run 50 times and the solutions ranked according to their fitness scores. In Figure 6, the first six GA runs (green) corresponded closely to form 1 while it is not until the tenth GA run

Page 5: Molecular Recognition of Receptor Sites using a.pdf

Molecular Recognition of Receptor Sites 47

Figure 6. Docking of trimethoprim into mouse L1210 DHFR. The experimental solution is coloured by atom type. The highest scoring GA solutions (runs 1 to 6) are typified by the green molecule (close to the bacterial mode of binding, (for pictures of this site, see Groome et nl., 1991)) while the closest GA solution (10th best score) to the experimental solution is coloured in yellow.

(yellow) that solutions close to the observed form 2 are seen.

Whilst it might have been more satisfying if form 2 (the experimental result in this study) had appeared more favourable in terms of the GA score, these results are nevertheless a reflection of the relatively low affinity of trimethoprim for the mammalian receptor coupled with the possibility of multiple binding modes. The two binding modes observed are probably very close in relative affinity for this receptor. A simple change such as crystallisation conditions results in conformational changes in the observed binding conformation of trimethoprim. Indeed, the GA results imply that other modes of binding of the trimethoxybenzene portion of the molecule may be observed under different crystallisation conditions. This conclusion is corroborated by the crystallographic observations of trimethoprim binding to L1210 DHFR (J. N. Champness et nl., 1994).

Reproducibility

Since the genetic algorithm is stochastic in nature, there may be problems with reproducibility and convergence. In our experience with the examples quoted here, the docking experiments using galactose, methotrexate, trimethoprim (E. co/i) and 4gneu5Ac2en were highly reproducible. Folate and mammalian DHFR presented a much more difficult problem and required many GA runs (typically 20 to 30) to elucidate reliably the observed (crystallo- graphic) binding mode.

Conclusions

Here, we have described the use of a GA to dock flexible ligands into protein binding sites. The results that have been described in the previous section and that are illustrated in Figures 1 to 6 demonstrate the effectiveness of the algorithm in reproducing

crystallographic studies of the bound complexes, even in the case of highly flexible ligands with variable protonation states. The accuracy with which the binding modes of ligands are predicted suggests that the representation and fitness function used here provides a useful description of the most important features involved in binding, at least for these high-affinity ligands. If this is indeed so, our results would suggest that the mechanism of molecular recognition requires the ligand to exhibit shape complementarity with the receptor (as has been assumed by previous docking programs) and to displace water at key points so forming hydrogen bonds in preference to water. The absence of bound water from the GA model (all water molecules are removed from the protein before docking) seems not to affect the ability of the GA to correctly predict the binding mode. Perhaps the bound water (in these examples) is a consequence of binding rather than a major contributory factor to the stabilisation of the bound ligand.

Several improvements to the algorithm are possible (such as ring conformational search, the inclusion of metal ions and accounting for possible variations in dielectric and solvent competition across the active site) and subsequent developments will be published elsewhere. Also, the dependence of this implementation on mapping hydrogen bonds from ligand to protein may not reflect the situation found in predominantly hydrophobic ligands. We are currently evaluating a number of alternative intermolecular potential functions (we currently use a simple Lennard-Jones 6-12 potential) to describe better the hydrophobic interactions and the results will be reported elsewhere.

Materials and Methods

Genetic algorithms

A GA is a computer program that mimics the process of evolution by manipulating a collection of data-structures called chromosomes. Each of these encodes a possible solution (in terms of a possible ligand-receptor interaction) to the docking problem and may be ass_igned a fitness score based on the relative merit of that solution. A steady-state-with-no-duplicates GA (Davis, 1991) with a fixed population size of 500 was used. An overview of this GA is given in Figure 7.

Starting from an initial, randomly generated population of chromosomes the GA repeatedly applies two genetic operators, crossover and mutation resulting in chromo- somes that replace the least-fit members of the population. Crossover combines chromosomes while mutation intro- duces random perturbations. Both operators require “parent” chromosomes that are randomly selected from the existing population with a bias towards the fittest, thus introducing an evolutionary pressure into the algorithm. This selection is known as roulette-wheel-selection, as the procedure is analogous to spinning a roulette wheel with each member of the population having a slice of the wheel that is proportional to its fitness. This emphasis on the survival of the fittest ensures that, over time, the population should move towards the optimum solution, i.e. to the correct binding mode in the present application,

Page 6: Molecular Recognition of Receptor Sites using a.pdf

48 Molecular Recognition of Receptor Sites

1. A set of.reproduction operators (crossover, mutation etc) is chosen.

l&h operator is assigned a weight.

2. An initial population is randomly created and the fltnesses of its

members determined.

3. An operator is chosen using roulette wheel selection based on

operator weights.

4. The parents required by the operator are chosen using roulette wheel

selection based on scaled fitness. I

5. The operator Is applied and child chromosomes produced. Their

fitness Is evaluated.

6. If not already present hi the population. the children replace the least

fit members of the population.

7. If an acceptable solution has been found stop otherwise goto 3.

Figure 7. Steady state with no duplicates GA.

provided that the fitness function is an accurate predictor of binding affinity.

The GA utilises a novel representation of the docking process. Each chromosome encodes an internal confor- mation of the &and and protein active site, and includes a mapping from hydrogen-bonding sites in the ligand to hydrogen-bonding sites in the protein. On decoding a chromosome, a least-squares fitting process is employed to position the ligand within the active site of the protein in such a way that as many of the hydrogen bonds suggested by the mapping are formed. The fitness of a decoded chromosome is then a combination of the number and strength of the hydrogen bonds that have been formed in this way and of the van der Waals energy of the bound complex. The remainder of this section gives a detailed account of the various components of our program, these being: the routines that are used to initialise the protein site and the ligand that is to be docked into it; the chromosome representation that is used to describe ligand and protein conformations and the hydrogen bonds that are formed between them; the fitness function that is used to evaluate these chromosomes; and the genetic operators that are applied to these chromosomes. We also discuss the calculation of the hydrogen-bond energies that play a vital role in the fitness function.

lnitialisation of the protein and of the ligand

Known crystal structures of bound ligand complexes were obtained from the Brookhaven Protein Data Bank (Bernstein et nl., 1977; BoLin et nl., 1982; Davis et nl., 1990; Vermersch ef nl., 1991; von Itzstein et nl., 1993; Champness ef nl., 1986). Water molecules and ions were removed (including ordered water molecules) and hydrogen atoms added at appropriate geometry Groups within the protein (and ligand) were ionised if this was appropriate at physiological pH. The structures were partially optimised,

Table 1

Allowed donors and acceptors based on SYBYL atom types

SYBYL atom types Donor Acceptor N.3, N.2, 0.3 Y Y N.l, N.ar, 0.2, 0.~02, F, Br, Cl N i- N.am, N.pl3, N.4 Y N

N.B. Donors must have a hydrogen atom attached.

to relieve any bad steric contacts introduced by hydrogen addition for ten cycles of molecular mechanics using the SYBYL general purpose Tripos 5.2 force field (Clark cl al., 1989). No further optimisation was employed so as not to risk distorting the crystal structure. The l&and was extracted from the structure and then fully optimised in an extended conformation (in vnc~o) by molecular mechanics. The remaining structure then comprised the protein active site.

The analysis of the protein commenced with the use of a flood-fill procedure to determine solvent-accessible protein atoms (Ho & Marshall, 1990) within the active site. Accessible hydrogen-bond donor and acceptor atoms were identified using the SYBYL atom-type categorisation (Clark et nl., 1989) in Table 1.

Lone pairs were added to acceptors at a distance of 1.0 A (taking into account the hybridisation state of the atom). Any single bond that connected a terminal donor or acceptor to the protein was selected as rotatable. This choice of rotatable bonds within the protein allows donors and acceptors in the active site to rotate their lone pairs and hydrogen atoms into the best position to form hydrogen bonds with the ligand. This of course may be easily extended to include side-chain flexibility (at greater computational cost); however, at present only the potential hydrogen-bonding groups have partial flexibility.

Hydrogen-bond donors and acceptors in the &and were identified. Lone pairs were added as before. All acyclic single non-terminal bonds were marked as rotatable.

Prior to docking a random translation was applied to the ligand and random rotations were applied to all bonds marked rotatable in the ligand and protein.

The chromosome representation

Each chromosome in the GA comprised four strings, two of which were binary and two of which were integer in character. Conformational information was encoded by two gray-coded (Goldberg, 1989) binary strings: one for the protein and one for the ligand, where each byte in the string encoded an angle of rotation about a rotatable bond. Two integer strings encoded mappings, suggesting possible hydrogen bonds between the ligand and the protein active site. The first integer string encoded a mapping from lone pairs in the ligand to hydrogen atoms in the protein, such that if V was the integer value at position P on the string, then the chromosome mapped the Pth lone pair in the ligand to the Vth hydrogen atom in the protein active site. In a similar manner, the second integer string encoded a mapping from hydrogen atoms in the ligand to lone pairs in the protein. By associating hydrogen atoms with lone pairs these mappings suggested hydrogen bonds between the ligand and protein. On decoding the chromosome the GA used a least-squares routine to attempt to form as many of these hydrogen bonds as possible.

The fitness function

The fitness function was evaluated in six stages as follows. (1) A conformation of the liga!ld and protein active site was generated. (2) The ligand was placed within the active site of the protein using a least-squares fitting procedure. (3) A hydrogen-bonding energy, H-Bad-En- crgy, was obtained for the complex. (4) A van der Waals energy, Coqkx-VD W-Encrg~, was obtained for the energy of interaction between the ligand and the protein. (5) A van der Waals energy, lllterrlnl_VDW_Energy, was

Page 7: Molecular Recognition of Receptor Sites using a.pdf

Molecular Recognition of Receptor Sites 49

Table 2

Hydrogen-bonding energies Energy (kcal/moU

Donor Acceptor Donor Acceptor Complex Bond

N4 N4 N4 N4 N4 N4 N4 N4 N4 N4 N4 N4 NPW NPL3 NPW NFL3 NPW NPW NPL3 NFL3 NW3 NPW NPW NPL3 NSDA N3DA N3DA N3DA N3DA N3DA N3DA N3DA N3DA N3DA N3DA N3DA NAM NAM NAM NAM NAM NAM NAM NAM NAM NAM NAM NAM 03DA 03DA 03DA 03DA 03DA 03DA 03DA 03DA 03DA 03DA 03DA 03DA NZDA NZDA NZDA NZDA NZDA NZDA N2DA N2DA NZDA

BR N2DA 02 oco2 CL Nl N3A 03A F N2A N3DA 03DA BR NZDA 02 oco2 CL Nl N3A 03A F N2A N3DA 03DA BR NZDA 02 oco2 CL Nl N3A 03A F N2A N3DA 03DA BR N2DA 02 oco2 CL Nl N3A 03A F N2A N3DA 03DA BR N2DA 02 oco2 CL Nl N3A 03A F N2A N3DA 03DA BR N2DA 02 oco2 CL Nl N3A 03A F

- 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784 - 24.784

- 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 4.427 - 1.147 - 1.147 - 1.147 - 1.147 - I.147 - 1.147 - 1.147 - 1.147 - 1.147 - 1.147 - 1.147 - 1.147 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919 - 5.919

- 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 33.236 - 10.005 - 10.005 - 10.005 - 10.005 - 10.005 - 10.005 - 10.005 - 10.005 - 10.005

- 0.818 - 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1.772 - 2.801 - 2.041

- 34.495 - 0.818

- 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1,772 - 2.801 - 2.041

- 34.695 - 0.818

- 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1.772 - 2.801 - 2.041

- 34.695 - 0.818

- 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1.772 - 2.801 - 2.041

- 34.695 - 0.818

- 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1.772 - 2.801 - 2.041

- 34.695 - 0.818

- 10.361 - 21.092 - 22.926

- 0.762 - 3.408

- 13.432 - 2.923 - 1.772

- 27.669 - 29.688 - 55.479 - 60.814 - 17.853 - 25.708 -31.132 - 28.219 - 26.332 - 18.263 - 16.068 - 64.062

- 2.592 - 14.048 - 23.172 - 32.058

- 2.817 - 6.136

- 16.699 - 5.322 - 4.213 - 5.562 - 5.897

-38.116 - 1.171 - 9.745

- 18.318 - 19.130

- 1.030 - 2.473

- 12.643 - 2.249 - 1.605 - 1.664 - 1.873

- 32.770 - 3.891

- 13.337 - 25.508 - 36.478

- 4.030 -8.166

- 18.818 - 7.232 - 5.439 - 7.552 - 7.116

- 39.411 - 32.630 - 44.360 - 54.626 - 63.925 - 33.203 - 36.045 - 47.223 - 36.369 - 33.097 - 36.506 - 35.804 - 68.563

- 8.557 - 17.452 - 27.081 - 34.627

- 8.887 - 10.380 - 19.890 - 10.120 - 10.068

- 3.969 3.555

- 11.505 - 15.006

5.791 0.582 5.182

-2.414 - 1.678

7.420 8.855

- 6.485 0.451

- 1.162 0.445

- 6.607 0.470

- 0.203 - 0.742

0.126 0.084

- 0.236 - 1.331 - 0.896 - 1.108 - 0.139

2.019 3.041

- 1.023 0.180 0.034 0.081

- 0.588 0.382

- 0.587 1.170 0.944 1.041

- 0.399 - 9.535

0.749 - 0.741 - 1.369 - 0.292

0.350 - 0.734 - 1.058 - 0.699 - 0.478 - 2.665 - 2.200 - 9.665 - 1.107 - 1.303 - 2.457 -2.112

0.009 - 2.371 - 2.429 - 2.534

0.364 1.012 2.114

- 3.598 - 0.022

1.131 1.645 0.906

- 0.193

Table 2 (confi~zued)

Hydrogen-bonding energies

Energy (kcal/mol) Donor Acceptor Donor Acceptor Complex Bond

NZDA N2A - 10.005 - 2.801 - 11.298 - 0.394 N2DA N3DA - 10.005 - 2.041 - 10.047 0.097 N2DA 03DA - 10.005 - 34.695 - 41.876 0.922

Dielectric constant = 1 .O; water dimer energy = - 1.902.

obtained for the internal energy of the ligand confor- mation. (6) A weighted sum was performed on the energy terms to give a final fitness score.

Each gray-coded byte in the binary strings was decoded to give an integer value between 0 and 255. This integer value was linearly re-scaled to give a real number between 0 and 27r, which was then used as an angle of rotation, in radians, for the appropriate rotatable bond. The random- ised three-dimensional co-ordinates were used as a starting configuration and the rotations were successively applied around the rotatable bonds to generate a new set of co-ordinates for the ligand and protein active site. The resulting conformations were then passed on to the least-squares fitting procedure.

For every donor-hydrogen atom and lone pair a virtual point was created colinear with the bond at 1.4 A from the donor or acceptor. The integer strings in each chromosome define mappings that suggest hydrogen bonds between the protein and ligand. Associated with each hydrogen and lone pair is a virtual point. Let N be the number of hydrogen atoms and lone pairs in the ligand, so that decoding a chromosome gave rise to N pairs containing a ligand virtual point and a corresponding protein virtual point. A Procrustes Rotation (Digby & Kempton, 1987) with a correction to remove inversion, yielded a geometric transformation that, when applied to all the ligand virtual points in the N pairs, minimised the least-square distance between each ligand virtual point and the corresponding protein virtual point. As not all hydrogen-bonding sites in the ligand form bonds to the protein, a second least-squares fit was applied to minimise the djstance between those pairs of points that were less than 5 A apart.

In order to determine the hydrogen-bonding energy of the complex each possible combination of donor hydrogen atom, and lone pair was examined in turn to see whether or not a bond had actually been formed in the binding mode resulting from the transformation described above. The geometrical arrangement of donor, acceptor and both virtual points was examined, and a weight (between 0 and 1) assigned to the potential bond. The weight was used to scale the full bond energy between the donor and acceptor (this full bond energy Epir, is described later), these bond energies being determined by the modelling experiments. This weighting scheme was designed to favour hydrogen bonds that are linear and whose virtual points overlap. Poor bond angles or long interaction distances gave rise to weights of zero. The scheme allows configurations that are not real hydrogen bonds to contribute to the fitness of a chromosome in the hope that they will evolve to become full hydrogen bonds (with higher weights and conse- quently higher fitness values) during the full course of the run. The geometry of a potential bond is shown in Figure 8.

The weight given to the bond was a function of ci, the separation of the virtual points, and 0, the bond angle between the donor hydrogen atoms and the acceptor lone pairs. Let rot be the weight of the hydrogen bond, such that zut = distmce-zut + nngkzot. If d was less than 0.2 A then

Page 8: Molecular Recognition of Receptor Sites using a.pdf

50 Molecular Recognition of Receptor Sites

Hhdroaen Virtual Point a -

d ,-Donor ’ , t_

P Lone Pair Virtual Point

6 Acceptor

Figure 8. The geometrical arrangement of the virtual points representing a lone pair and a hydrogen when a hydrogen bond is formed. ti is the distance between the virtual points and 0 is the bond angle between the donor hydrogen and the acceptor lone pair.

&sfn77ce_zut was 1 aid if d was greater than 3.5 d; then disnznnce-7ut was 0. Otherwise, d lay in the interval (0.2,3.5), in which case d was linearly re-scaled to the interval (1,O) and squared to give distance-wt. Theoretical studies and observation of crystal structures (Murray-Rust & Glusker, 1984) show that strong hydrogen bonds are obtained if a line drawn from the donor through the hydrogen atom and a second line drawn from the acceptor through the lone pair intersect at around 180”, i.e. 0 in Figure 8 should be close to 180”. Accordingly if 0 was greater than 160” then nrzgle-zut was set to 1 and if 0 was less than 10” then nngle-zut was set to 0. Otherwise,@ lay in the interval (160, lo), in which case it was linearly re-scaled to (1,O) and squared to give angle-rot.

The energy H-Borrd-Energy was a sum of individual bond energies. Each lone pair in the l&and was compared with every donor hydrogen atom in the protein active site and the weights of any bonds formed recorded. For each &and lone pair, the bond with the highest weight contributed to H-Band-Energy as did any other bond with weight greater than 0.4. An identical procedure was adopted in comparing each donor hydrogen atom in the ligand with every lone pair in the protein. This mechanism was developed to allow the GA to form many-centre hydrogen-bonding motifs, white preventing a large number of poor bonds making a significant contribution to the binding energy

Following the placement of the ligand into the active site of the protein a Lennard-Jones 6-12 potential (Hirschfelder ef nl., 1964) was used to determine the energy of interaction between the ligand and protein. The 6-12 potential was of the form:

where i and j are a pair of atoms, aii is the distance between i and j divided by the sum of their van der Waals radii, and kii is the arithmetic approximation to the geometric mean of ki and ki, where the k values correspond to the minimum of the potential well between the atom types i and j. The values for the radii and k values for each atom type were taken from the SYBYL general purpose Tripos 5.2 force field (Clark et al., 1989). In order to calculate the interaction efficiently a lookup table and a cut-off distance, of aii = 1.5, were employed. A correction was made to the 6-12 potentia1 for atoms involved in hydrogen bonding. The energy of interaction between a hydrogen atom and acceptor was set to zero. For the interaction between a

donor and acceptor atom the sum of the van der Waals radii was scaled by 0.7. The sum of energies for all pairwise interactions with the addition of any penalty term was then Complex-VD W-Energy.

The internal steric energy of the ligand conformation was calculated, using the 6-12 potential described above. The energy for the ligand, Interflu/-VDW-Energy, was expressed as a difference in energy between the van der Waals energy of the current ligand conformation and the energy of the original minimised conformation (gas-phase) of the input hgand structure.

The final fitness score was determined by a weighted sum of all the energy components. The weights were determined by empirical adjustment to best reproduce known bound ligand structures. Optimisation of the weighting scheme is an area of current investigation. Since higher energies implied poorer solutions, the weights were all negative. The fitness score was given by:

- H-Band-Energy - 0.0005 x Carrzples-VD W-Energy.

If Ihmal-VD W-Energ!/ was positive then 0.5 x Irz~erlzal-VDW-Elze~gy was subtracted from the fitness score. In order to prevent the GA from trying to minimise the internal energy of the ligand below that of the reference conformation, this term was used only when Ilzte~rzal-VDW-Elzergy was positive.

The genetic operators

The GA made use of two genetic operators: mutation and crossover. The crossover operator required two parents and produced two children, while the mutation operator required one parent and produced one child. Operators were chosen using roulette-wheel selection based on operator weights (Davis, 1991). The operator weights for crossover and mutation were 10 and 40, respectively Thus, the GA search procedure emphasised expIoration rather than exploitation. The parents for these operators were selected using the technique of roulette- wheel selection on linear normalised (rank-based) fitness values (Davis, 1991). The current population was sorted on increasing fitness scores. The normahsed fitness of the individual with the worst fitness score in the current population was set to 25,000. Normalised fitness values then increased by 12 per individual. This parameterization equates to a low selection pressure of 1.1, which represents the relative probability that the best individual will be chosen as a parent compared with the average individual. This low selection pressure reduced the likelihood of the GA converging to sub-optimal solutions.

The crossover operator performed two-point crossover on the integer strings and one-point crossover on the binary strings. The one-point crossover on binary strings is the traditional GA recombination operator. The parent chromosomes were copied to the children. One of the four chromosome strings was selected randomly and the appropriate crossover applied to it. The remaining strings were unchanged.

The mutation operator performed binary-string mu- tation on binary strings and integer-string mutation on integer strings. Each bit in the binary string had a probability of mutation equal 1.0 over the length of the binary string. If the binary string remained unchanged after one application of the mutation operator, the operator was repeatedly applied until the string was mutated. The integer-string mutation performed a single mutation on an integer value. A position was randomly chosen on the integer string and the value at that position was mutated

Page 9: Molecular Recognition of Receptor Sites using a.pdf

Molecular Recognition of Receptor Sites 51

N4 NPW MDA

03DA Pk?DA ER

c”Y C-F C-C- N.l

Cd

C-02

F Nl 02

0.0X2

C-C

td

03A WA oco2

Figure 9. Donor and acceptor types and associated fragments used in hydrogen-bonding energy determi- nation.

to a (different) new value that was randomly chosen from the set of allowed integer values. The operator proceeded as follows: the parent chromosome was copied to the child chromosome and one of the four strings was randomly selected and the appropriate mutation applied.

The GA automatically terminated following the application of 80,000 genetic operators. However, conver- gence (indicated by an improvement of less than 0.01 in the best fitness score over the previous 6500 operations) resulted in premature termination.

Calculation of hydrogen-bond energies

The hydrogen-bond energy between a donor and an acceptor is an important component of the fitness function since each hydrogen-bonding pair contributes to the overall energy of binding. Initially the donor (d) and the acceptor (a) are in solution but on coming together (da) water (w) is stripped off. Therefore to simulate the interaction energy Epair is composed of four terms:

S& hydrogen-bond donor and 12 hydrogen-bond acceptor types were defined based on SYBYL molecular mechanics atom types (Clark et nl., 1989), These were embedded in the model fragments shown in Figure 9.

total of 91 C&=o . .

simulation systems (e.g. . HN + (CH,), or 02A . . N-4) were required to

model the possible pairwise interactions, and the interaction energy calculated using a number of different approaches. These included gas-phase semi-empirical quantum-mechanics (MOPAC, AMI, PM3: Stewart, 1992), solvent based (AMSOL, SM3: Cramer et al., 1993), VAMP (Timothy Clark, personal communication) with a self-con- sistent reaction field (Rinaldi, 1993), and molecular mechanics (Clark et al., 1989) with MOPAC (AM1 and PM3) Coulson and Mull&en atom charges (Stewart, 1992). Ab iuitio calculations are in progress to determine 6-31G* potential derived charges.

Currently, gas-phase molecular mechanics with PM3

Table 3

Hydrogen-bonding energies for solvated N-4 (lysine) donor

Energy (kcal/mol)

Acceptor BR

Donor 1.695

Acceptor - 0.563

Complex Bond 1.357 - 0.164

NZDA 02 oco2 CL Nl N3A 03A F N2A N3DA

1.695 1.328 1.695 0.568 1.695 - 0.260 1.695 - 0.435 1.695 - 0.602 1.695 0.031 1.695 0.068 1.695 - 0.692 1.695 2.533 1.695 - 0.061

3.156 - 0.256 2.571 - 0.081 1.623 - 0.201 1.648 - 0.001 1.709 0.227 1.170 - 0.945 1.535 - 0.617 1.454 0.062 4.167 - 0.450 1.463 - 0.560

03DA 1.695 - 0.850 0.444 - 0.790

Dielectric constant = 78.54; water dimer energy = - 0.389.

Mulliken charges and a dielectric constant of 1.0 in the receptor cavity give results that are consistent with the observed binding modes; this may be surprising, however, recent NMR studies imply that the protein-ligand hydrogen-bond interactions in the receptor cavity may have energetics more consistent with gas-phase values (Beeson et al., 1993) than has previously been appreciated. Indeed, the local dielectric.may be less significant with high angular dependence of hydrogen-bonding energies. The values are listed in Table 2.

The crystal structures studied contained a number of highly solvated and exposed lysine residues at the lip of the receptor cavity In order to account for their solvated nature and, more importantly the highly flexible nature of the lysine side-chain (resulting in an entropically favoured solvated state), bonds formed between the ligand and the charged nitrogen donor in lysine used molecular mechanics results with a dielectric constant of 78.54 (the dielectric constant of water at 25°C). These energies are

A iI

H- N.plc 6 N.plc

A A

(2)

H. ,C N.plc

’ Q H. ,C. J-l N.plc N.plc

k A

(3)

Figure 10. The N.plc atom type The protonation on the nitrogen atom in (1) is shared with the second nitrogen atom. (2) The N.plc atoms for the fragment in (1). (3) The N.plc atoms in the arginine residue.

Page 10: Molecular Recognition of Receptor Sites using a.pdf

52 Molecular Recognition of Receptor Sites

shown in Table 3. All lysine groups are currently treated as being solvated, which is rather arbitrary It is intended to refine this description based on their solvent-exposed surface areas to determine more accurately which lysine residues are mobile and therefore which should be specially treated in this way

A new atom type N.plc, and associated donor type NPLC, was created to simulate the distribution of charge across two or three nitrogen atoms, as occurs in protonated-guanidine groups and arginine residues, such

as illustrated in Figure 10. The energy of a hydrogen bond A.. . NPLC was

detemlined as follows. Let EN4 be the energy of the bond A . N4, let ENIJL1 be the energy of the bond A . NPW and let II be the number of nitrogen atoms across which the charge is distributed. The bond energy is then:

f (Ep,d + (II - l)Ep&.

Acknowledgements

We thank J. N Champness, C. R. Beddell and A. J Geddes for useful discussions, the Science and Engineering Research Council and the Wellcome Foundation Limited for financial support and Tripos Associates for the provision of hardware and software. This paper is a contribution from the Krebs Institute for Biomolecular Research, which is a designated centre for biomolecular sciences of the Science and Engineering Research Council.

References Beeson, C., Nguyen, I’., Shipps, G. & Dix, T. A. (1993). A

comprehensive description of the free energy of an intramolecular hydrogen-bond as a function of solvation: NMR study J. Amr. Chem. Sot. 115,

’ 6803-6812. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, F.,

Bryce, M. D., Rogers, J. R., Kennard, O., Shikanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Bid. 112, 535-542.

Blaney J. M. & Dixon, J. S. (1993). A good ligand is hard to find: automated docking methods. Persyect. Drug Disc. DCS@I, 1, 301-319.

Blommers, M. J., Lucasius, C. 8.. Kateman, G. & Kaptein, R. (1992). Conformational analysis of a dinucleotide photodimer with the aid of a genetic algorithm. Bio,vohyrwrs, 32, 45-52.

Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R. C. & Kraut, J. (1982). Crystal structures of Escherichin co/i and hctohcilhs cnsei dihydrofolate reductase refined at 1.7 h; resolution. 1. Biol. Charr. 257, 13650-13662.

Brown, R. D., Jones, G., Willett, l? & Glen, R. C. (1994). Matching two-dimensional chemical graphs using genetic algorithms. J. Clw~n. hfom. Cornput. Sci. 34, 63-70.

Champness, J. N., Stammers, D. K., Beddell, C. R. (1986). Crystallographic investigation of the cooperative interaction between trimethoprim, reduced cofactor and dihydrofolate reductase. FEBS Letters, 199,61-67.

Champness, J. N., Achari, A., Ballantine, S. l?, Bryant, I? K., Delves, C. J. & Stammers, D. K. (1994). The structure of ~JlCCWJocytiS Cnriuii dihydrofolate reductase to 1.9 A resolutions. StJwftm, 2, 915-924.

Cheung, H. T. A., Birdsall, B., Frenkiel, T. A., Chau, D. D. & Feeney, J. (1993). r3C NMR determination of the

tautomeric and ionization states of folate in its complexes with Lnctobncilllrs cnsei dihydrofolate reductase. Biocherr~istry, 32, 6846-6854.

Clark, D. E., Jones, G., Willett, P., Kenny, l? W. &Glen, R. C. (1994). Pliarniacoplioric pattern matching in files of three-dimensional chemical structures: comparison of conformational-searching algorithms for flexible searching. /. C/JUJI. IJI~OIVJI. COJJJ~U~. Sci. 34, 197-206.

Clark, M., Cramer, R. D. & Van Opdenbosch, N. (1989). Validation of the general-purpose TRIPOS 5.2 force field. 1. Cornprrt. Clretr~. 10, 982-1012.

Cramer, C. J., Lynch, G. C., Hawkins, G. D. & Truhlar, 0. G. (1993). AMSOL 3.5~ User Mnr~rml, Quantum Chemical Program Exchange, Department of Chemistry Univer- sity of Indiana, Bloomington, IN.

Dandekar, T. & Argos, I? (1993). Folding the main chain of small proteins with the genetic algorithm. /. Mol. Biol. 236, 844-861.

Davis, J. F., Delcamp,T. J., Prendergast, N. J., Ashford, V. A., Freisheim, J. H. & Kraut, J. (1990). Crystal structures of recombinant human dihydrofolate reductase complexed with folate and 5-deazafolate. Biochwistry, 29, 9467-9479.

Davis L. D. (1991). Editor of Hmdbook of Gerretic Al~orithrrrs, Van Nostrand Reinhold, New York.

DesJarlais, R. L., Sheridan, R., Dixon, J. S., Kuntz, I. D. & Venkataraghavan, R. (1986). Docking flexible ligands to macromolecular receptors by molecular shape. 1. Med. C~CJJI. 29, 2149-2153.

DesJarlais, R. L., Sheridan, R. P., Seibel, G. L., Dixon J. S., Kuntz, I. D. & Venkataraghavan, R. (1988). Using shape complementarity as an initial screen in design- ing ligands for a receptor binding site of known three- dimensional structure. 1. Med. C/ICJ~I. 31, 722-729.

Digby, l? G. N. & Kempton, R. A. (1987). M~rltizwinte Amdysis Of Ecologicnl COJJllJJlfJ7itiCS. pp. 112-115, Chapman and Hall, London.

Dixon, J. S. (1993). Flexible docking of ligands to receptor sites using genetic algorithms. In Trends in QSAR nrxt Molec~rlnr Mocfelliryq 92 (Wermuth, C. G., ed.), pp. 412-413, ESCOM, Leiden.

Fountain, E. (1992). Application of genetic algorithms in the field of constitutional similarity j. ClJe~ll. IJIJOJVJI. COJII~II~. Sci. 32, 748-752.

Goldberg, D. E. (1989). Genetic A!goriths ill Sen~h, Oytiraisntiort rind Mnchille Lenmirzg, Addison-Wesley, Reading, MA.

Goodsell, D. S. & Olson, A. J. (1990). Automated docking of substrates to proteins by simulated annealing. ProteilJs: Shwct. FI~JTC~. Genet. 8, 195-202.

Groom, C. R., Thillett, J., North, A. C. T., Pictet, R. & Geddes, A. J. (1991). Trimethoprim binds in a bacterial mode to the wild-type and E30D mutant of mouse dihydofolate reductase. 1. Biol. Cher~r. 266, 19890- 19893.

Hirschfelder, J. O., Curtiss, C. F. & Bird, R. B. (1964). Molec~dnr T/IcoI’?/ ojGnses nJ?d Liquids, John Wiley, New York.

HO, C. M. W. & Marshall, G. R. (1990). Cavity search: an algorithm for the isolation and display of cavity like binding regions. I. CottJput. Aid. Mol. Des. 4, 337-354.

Holland, J. H. (1975). Ahptntiorl irr N~t~lrnl md Artificid Stjstems, University of Michigan Press, Ann Arbor.

Judson, R. S., Jaeger, E. I?, Treasurywala, A. M. & Peterson, M. L. (1993). Conformational searching methods for small molecules: II. Genetic algorithm approach. 1. COJJl/.JUt. CkJJJ. 14, 1407-1414.

Kuntz, I. D. (1992). Structure-based strategies for drug design and discovery Science, 257, 1078-1082.

Page 11: Molecular Recognition of Receptor Sites using a.pdf

Molecular Recognition of Receptor Sites 53

Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. & Ferrin, T. E. (1982). A geometric approach to macromolecule-&and interactions. J. Mol. Biol. 161, 269-288.

Lawrence, M. C. & Davis, I? C. (1992). CLIX-a search algorithm for finding novel ligands capable of binding proteins of known 3-dimensional structure. Proteins: Strwzt. Rrnct. Gem+. 12, 3141.

Leach, A. R. (1994). Ligand docking to proteins with discrete side-chain flexibility 1. Mol. Biol. 235,345-356.

Matthews, D. A., Bolin, J. T., Burridge, J. M., Filman, D. J., Volz, K. W., Kaufman, B. T., Beddell,C. R., Champness, J. N., Stammers, D. K. & Kraut, J. (1985). Refined crystal structures of Escherichin co/i and chicken liver dihydrofolate reductase containing bound trimetho- prim. 1. Biol. Chm. 260, 381-391.

Murray-Rust, I? & Glusker, J. (1984). Directional hydrogen bonding to sp2 and sp3-hybridised oxygen atoms and its relevance to ligand-macromolecular interactions. 1. Amer. Chern. Sot. 1018-1025.

Payne, A. W. R. &Glen, R. C. (1993). Molecular recognition using a binary genetic search algorithm. 1. Mol. Graph. 11, 74-91.

Rinaldi, D., Rivail, J. & Rguini, N. (1993). Fast geometry optimisation in self-consistent reaction field compu- tations on solvated molecules. 1. Conyut. Chetn. 13, 675-680.

Shoichet, B. K., Stroud, R. M., Santi, D. V, Kuntz, I. D. & Perry K. M. (1993). Structure-based discovery of

inhibitors of thymidylate synthase. Science, 259, 1445-1450.

Smellie, A. S., Crippen, G. M. & Richards, W. G. (1991). Fast drug-receptor mapping by site-directed distances: a novel method of predicting new pharmacological leads. J. Chem. Inform. Comprtt. Sci. 31,386-392.

Stammers, D. K., Champness, J. N., Beddell, C. R., Dann,. J. G., Eliopoulos, E., Geddes, A. J., Ogg, D. & North, A. C. T. (1987). The structure of mouse L1210 dihydrofolate reductase-drug complexes and the construction of a model of human enzyme. FEBS Letters, 218, 178-184.

Stewart, J. J. I? (1992). MOPAC User Mnnnnl Version 6.0, Quantum Chemical Program Exchange, Depart- ment of Chemistry, University of Indiana, Blooming- ton, IN.

Vermersch, I? S., Lemon, D. D., Tesmer, J. J. G. & Quiocho, F. A. (1991). Sugar-binding and crystallographic studies of an arabinose-binding protein mutant (met”“‘Leu) that exhibits enhanced affinity and altered specificity Biochnnistiy 30, 6861-6866.

von Itzstein, M., Wu, Wen-Yang, Kok G. B., Pegg, M. S., Dyson, J. C., Jin, B., Phan, T. V, Smythe, M. L., White, H. F., Oliver, S. W., Colman, I? M., Varghese, J. N., Ryan, D. M., Woods, J. M., Bethell, R. C., Hotham, V. J., Cameron, J. M. & Penn, C. R. (1993). Rational design of potent sialidase-based inhibitors of influenza virus replication. N&we (London), 363, 418-423.

Edited by F. Cohen

(Received 10 May 1994; accepted 30 August 1994)