Bioinformatics in drug designing and development process

Bioinformatics in drug designing and development process

Brijesh S. Yadav and Bhaskar SharmaDivision of Biochemistry

IVRI, Izatnagar

The access to huge amounts of diverse data viz., sequence data, chemical structure information,

biological activity patterns and global expression profiles of mRNA, protein and other molecules is

the unique characteristic of the post-genome drug discovery and development process. In the last

decade, a drastic change in drug design and development process began, incorporating basically the

knowledge of the three dimensional structures of target proteins into the design process. Where three

dimensional structures of relevant drug targets were not available from X-ray crystallography or

NMR, comparative models based on homologues defining topographies of the complementary

surfaces of ligands and their protein targets, began to be exploited in lead optimization.

Bioinformatics in drug designing:

Three dimensional structure of protein is a key factor in drug discovery at every stage in the design

process. Classically it has been exploited in lead optimization, a process that uses structure to guide

the chemical modification of a lead molecule to give an optimized fit in terms of shape, hydrogen

bonds and other non-covalent interactions with the target. Protein structure can also be used in target

identification and selection (the assessment of the ‘druggability’ or tractability of a target).

Traditionally, this has involved homology recognition assisted by knowledge of protein structure;

but now structural genomics programmes are seeking to define representative structures of all

protein families, allowing proposals of binding regions and molecular functions. More recently, X-

ray crystallography has been used to assist the identification of hits by virtual screening and more

directly in the screening of chemical fragments. The key roles of structural biology and

bioinformatics in lead optimization remain as important as ever. Here, we focus on their roles in

target identification and lead discovery.

Fig1. Drug discovery classically follows the path from target selection, through lead discovery to

lead development. Although structural biology has historically had a role in the final stages during

lead optimization, it is now having an effect at all stages. Homology recognition and structural

genomics can aid target selection, while structure-based screening assists the lead discovery and

development processes.

Target identification from sequence structure homology recognition

Protein structures are a rich source of information about membership of families and super families.

If the three dimensional structure of a target protein is not known, then the proteins which are its

homologue and whose three dimensional structure is known are identified assuming that they are

most likely to exhibit similar structure and function. This is called ‘homology recognition’. An

example of this process was the recognition of HIV proteinase as a distant member of the

pepsin/renin superfamily and the subsequent modeling of its three-dimensional structure and the

design of inhibitors. In general, putative relatives are identified, the sequences aligned, and the three-

dimensional structures modeled. This is usually helpful in proposing binding sites and molecular

functions if key residues are conserved.

Target Protein sequence is retrieved from NCBI or other protein sequence database. The simplest

method of template identification relies on serial pairwise sequence alignments aided by database

search techniques such as BLAST. When performing a BLAST search, a reliable first approach is to

identify hits with a sufficiently low E-value, which are considered sufficiently close in evolution to

make a reliable homology model. The template may have a function similar to that of the query

sequence, or it may belong to a homologous operon. However, a template with a poor E-value

should generally not be chosen.

Target validation and the identification of ligand binding regions

Target validation is the crucial step in the drug-discovery process. Most drugs are inhibitors that block the

action of a particular target protein. But the only way to be completely certain that a protein is instrumental

in a given disease is to test the idea in human which is not possible. Obviously such clinical trials cannot be

used for initial drug development, which means that a potential target must undergo a validation process —

its role in disease must be clearly defined before drugs are sought that act against it, or before it is used to

screen large numbers of compounds for drug activity. Role of bioinformatics in target validation is minimal.

It is mostly siRNA, comparative proteomic analysis, global expression profiles, gene knock out type of wet

lab experiments which are used for validating a target before actual drug designing efforts begin. The

information about known and explored therapeutic protein and nucleic acid targets, the targeted diseases

pathway information, corresponding drugs are available in therapeutic target database. This database can

help in validating the targets.

The next step after the target has been validated is predicting its 3 dimensional structure. Usually it is done

through homology modeling using a template whose structure is known. The modeler will generate more

than one three dimensional structure. Next step is to validate the structure. PROCHEK server can be

used for the validation of modeled protein structure. Quality of models is evaluated with respect to

energy and stereochemical geometry. ProSA-Web server is used to evaluate energy and verify 3D to

evaluate the local compatibility of the model. PROCHECK, Verify3D, ERRAT and programs using

Ramachandran plot are used to validate model of proteins.

Once the three dimensional has been made the next step is identifying on the surface the sites

involved in productive inter molecular interactions that might give clues about functions and binding

sites for these proteins. The sequence motif databases, such as PROSITE identify residues likely to

be involved in function, three-dimensional descriptors of functional sites have an advantage as the

sites themselves are usually made from discontinuous regions of the protein sequence. The

functional/interaction sites are predicted computationally, for example by identifying steric strain or

other types of high-energy conformations that often occur at active sites or through identifying clefts

that can accommodate ligands. Almost all protein functional sites arise through mutation and

selection and hence they will be the most highly conserved regions of a protein. Most widely used

method based on evolutionary conservation of sequence is ‘evolutionary trace’ in which residues

that are conserved are identified.

Lead optimization through Docking-

The classification of targets into families has allowed the design of focused compound libraries for

particular families. Several approaches are now concentrating on screening very small molecules, or

‘fragments’ from which a lead can be designed using a knowledge, derived from biophysical assays,

of how the fragment binds in the active site of the target.

In parallel, insilico approaches for identifying potential drug candidates have been developed.

Ligand docking aims to find the optimum binding position and orientation for a compound in the

active site of the proteins. The best docking programmes correctly dock about 70–80% of ligands

when tested on large sets of protein–ligand complexes. However, difficulties arise in trying to

predict the affinities of the different compounds for the protein active site. Nevertheless , virtual

screening has proved helpful in docking and ranking a large number of compounds so that the

highest -ranking compounds can be selected for acquisition or synthesis and experimentally tested

for activity against the target protein.

Fig1. Showing the diagrammatic representation of core bioinformatics method of drug development

Conclusions:

Bioinformatics offers a means to get to a structure through sequence; while structure- aided drug

design offers a means to get to a drug through structure. Drug design with the help of computers may

be used at any of the following stages of drug discovery: 1. Hit identification using virtual screening

(structure- or ligand-based design), 2.hit-to-lead optimization of affinity and selectivity (structure-

based design, QSAR, etc.), 3. Lead optimization of other pharmaceutical properties while

maintaining affinity. For structure-based drug design, several post-screening analysis focusing on

protein-ligand interaction has been developed for improving enrichment and effectively mining

potential candidates. Bioinformatics tools accumulate the information in the form of databases.

These databases save time, money and efforts of the researchers and save the information in the form

of groups and subgroups. The information in the databases also gives the knowledge of molecules.

Documents

Bioinformatics in drug designing and development process