Upload
brijesh-singh-yadav
View
147
Download
0
Embed Size (px)
DESCRIPTION
The access to huge amounts of diverse data viz., sequence data, chemical structure information, biological activity patterns and global expression profiles of mRNA, protein and other molecules is the unique characteristic of the post-genome drug discovery and development process.
Citation preview
Bioinformatics in drug designing and development process
Brijesh S. Yadav and Bhaskar SharmaDivision of Biochemistry
IVRI, Izatnagar
The access to huge amounts of diverse data viz., sequence data, chemical structure information,
biological activity patterns and global expression profiles of mRNA, protein and other molecules is
the unique characteristic of the post-genome drug discovery and development process. In the last
decade, a drastic change in drug design and development process began, incorporating basically the
knowledge of the three dimensional structures of target proteins into the design process. Where three
dimensional structures of relevant drug targets were not available from X-ray crystallography or
NMR, comparative models based on homologues defining topographies of the complementary
surfaces of ligands and their protein targets, began to be exploited in lead optimization.
Bioinformatics in drug designing:
Three dimensional structure of protein is a key factor in drug discovery at every stage in the design
process. Classically it has been exploited in lead optimization, a process that uses structure to guide
the chemical modification of a lead molecule to give an optimized fit in terms of shape, hydrogen
bonds and other non-covalent interactions with the target. Protein structure can also be used in target
identification and selection (the assessment of the ‘druggability’ or tractability of a target).
Traditionally, this has involved homology recognition assisted by knowledge of protein structure;
but now structural genomics programmes are seeking to define representative structures of all
protein families, allowing proposals of binding regions and molecular functions. More recently, X-
ray crystallography has been used to assist the identification of hits by virtual screening and more
directly in the screening of chemical fragments. The key roles of structural biology and
bioinformatics in lead optimization remain as important as ever. Here, we focus on their roles in
target identification and lead discovery.
Fig1. Drug discovery classically follows the path from target selection, through lead discovery to
lead development. Although structural biology has historically had a role in the final stages during
lead optimization, it is now having an effect at all stages. Homology recognition and structural
genomics can aid target selection, while structure-based screening assists the lead discovery and
development processes.
Target identification from sequence structure homology recognition
Protein structures are a rich source of information about membership of families and super families.
If the three dimensional structure of a target protein is not known, then the proteins which are its
homologue and whose three dimensional structure is known are identified assuming that they are
most likely to exhibit similar structure and function. This is called ‘homology recognition’. An
example of this process was the recognition of HIV proteinase as a distant member of the
pepsin/renin superfamily and the subsequent modeling of its three-dimensional structure and the
design of inhibitors. In general, putative relatives are identified, the sequences aligned, and the three-
dimensional structures modeled. This is usually helpful in proposing binding sites and molecular
functions if key residues are conserved.
Target Protein sequence is retrieved from NCBI or other protein sequence database. The simplest
method of template identification relies on serial pairwise sequence alignments aided by database
search techniques such as BLAST. When performing a BLAST search, a reliable first approach is to
identify hits with a sufficiently low E-value, which are considered sufficiently close in evolution to
make a reliable homology model. The template may have a function similar to that of the query
sequence, or it may belong to a homologous operon. However, a template with a poor E-value
should generally not be chosen.
Target validation and the identification of ligand binding regions
Target validation is the crucial step in the drug-discovery process. Most drugs are inhibitors that block the
action of a particular target protein. But the only way to be completely certain that a protein is instrumental
in a given disease is to test the idea in human which is not possible. Obviously such clinical trials cannot be
used for initial drug development, which means that a potential target must undergo a validation process —
its role in disease must be clearly defined before drugs are sought that act against it, or before it is used to
screen large numbers of compounds for drug activity. Role of bioinformatics in target validation is minimal.
It is mostly siRNA, comparative proteomic analysis, global expression profiles, gene knock out type of wet
lab experiments which are used for validating a target before actual drug designing efforts begin. The
information about known and explored therapeutic protein and nucleic acid targets, the targeted diseases
pathway information, corresponding drugs are available in therapeutic target database. This database can
help in validating the targets.
The next step after the target has been validated is predicting its 3 dimensional structure. Usually it is done
through homology modeling using a template whose structure is known. The modeler will generate more
than one three dimensional structure. Next step is to validate the structure. PROCHEK server can be
used for the validation of modeled protein structure. Quality of models is evaluated with respect to
energy and stereochemical geometry. ProSA-Web server is used to evaluate energy and verify 3D to
evaluate the local compatibility of the model. PROCHECK, Verify3D, ERRAT and programs using
Ramachandran plot are used to validate model of proteins.
Once the three dimensional has been made the next step is identifying on the surface the sites
involved in productive inter molecular interactions that might give clues about functions and binding
sites for these proteins. The sequence motif databases, such as PROSITE identify residues likely to
be involved in function, three-dimensional descriptors of functional sites have an advantage as the
sites themselves are usually made from discontinuous regions of the protein sequence. The
functional/interaction sites are predicted computationally, for example by identifying steric strain or
other types of high-energy conformations that often occur at active sites or through identifying clefts
that can accommodate ligands. Almost all protein functional sites arise through mutation and
selection and hence they will be the most highly conserved regions of a protein. Most widely used
method based on evolutionary conservation of sequence is ‘evolutionary trace’ in which residues
that are conserved are identified.
Lead optimization through Docking-
The classification of targets into families has allowed the design of focused compound libraries for
particular families. Several approaches are now concentrating on screening very small molecules, or
‘fragments’ from which a lead can be designed using a knowledge, derived from biophysical assays,
of how the fragment binds in the active site of the target.
In parallel, insilico approaches for identifying potential drug candidates have been developed.
Ligand docking aims to find the optimum binding position and orientation for a compound in the
active site of the proteins. The best docking programmes correctly dock about 70–80% of ligands
when tested on large sets of protein–ligand complexes. However, difficulties arise in trying to
predict the affinities of the different compounds for the protein active site. Nevertheless , virtual
screening has proved helpful in docking and ranking a large number of compounds so that the
highest -ranking compounds can be selected for acquisition or synthesis and experimentally tested
for activity against the target protein.
Fig1. Showing the diagrammatic representation of core bioinformatics method of drug development
Conclusions:
Bioinformatics offers a means to get to a structure through sequence; while structure- aided drug
design offers a means to get to a drug through structure. Drug design with the help of computers may
be used at any of the following stages of drug discovery: 1. Hit identification using virtual screening
(structure- or ligand-based design), 2.hit-to-lead optimization of affinity and selectivity (structure-
based design, QSAR, etc.), 3. Lead optimization of other pharmaceutical properties while
maintaining affinity. For structure-based drug design, several post-screening analysis focusing on
protein-ligand interaction has been developed for improving enrichment and effectively mining
potential candidates. Bioinformatics tools accumulate the information in the form of databases.
These databases save time, money and efforts of the researchers and save the information in the form
of groups and subgroups. The information in the databases also gives the knowledge of molecules.