13.11.2013. Bioinformatics - Proteomics
Bioinformatics − Proteomics Lecture 9
Prof. László Poppe
BME Department of Organic Chemistryand Technology
Bioinformatics – Proteomics
Lecture and practice
2 Bioinformatika 22009. 04. 17.
Map of bioinformatics A medicinal chemistry point of view
3 Bioinformatics13.11.2013.
Drug-receptor interaction modeling
The drug-receptor interaction theoretical
modeling of the flow chart and its
relationship with the experiments.
Theoretical studies are in rectangles, ellipses
represent experimental data. The gray area is
the modeling of small molecules, the white
areas are related to bioinformatics.
Advanced quantum
mechanical methods
(eg. QM / MM) can be
applied
4 Bioinformatics2009. 04. 17.
Farmacophore model constructionD1 dopamin receptor
Allowed
pharmacophore
regions
Forbidden
pharmacophore
regions
Pharmacophore model: a spatial
alignement of ligands. Models a 3D image
being complementary to the active site.
Bioinformatika 22009. 04. 17.
Way to assess the fit:
1. Consideration of simple geometric fit
2. Evaluation of the fit: a complex energy function, electrostatic complementarity, etc..
According to the model:
1. Both molecules are rigid
2. One molecule (usually a ligand) is flexible, and the other (usually protein) is rigid
3. Both are flexible (the search is very time-consuming)
According to the algorithm:
1. Molecular dynamics
2. Monte Carlo methods (generate random positions)
3. Simulated annealing: simulation of a slow cooling of a high temperature system, it helps to achieve the minimum energy
4. Other methods5
Docking molecules to proteins
Predicting / testing the binding of a small molecule
(ligand, substrate and coenzyme, etc.) inside / on the
surface of a protein (receptor).
Prediction / test of the binding of two proteins to each
other
Predictions / analysis of the binding of protein to DNA
6 Bioinformatics13.11.2013.
QM/MM methods
MM
Region
Treatment of proteins in several regions.
Important part of the active site and the substrate
(or product, reactive intermediate, transition
state) contains a more accurate calculation for
QM region.
The QM region is applicable for analyzing
electronic / quantum interactions (semi-empirical
and HF / DFT methods).
The rest of the protein is treated with classical
MM force-field.
7 Bioinformatika 22009. 04. 17.
QM/MM methodsThe boundary of classical and quantum regions
Splitting of a glutamate side chain to quantum
and classical regions.
The terminal CH2CO2 group is treated with
quantum mechanics, and a molecular mechanics
force-field is applied to the main chain atoms.
Determination of the cutting surface is the most
difficult question (usually along the C(sp3)-
C(sp3) bond).
There are two main approaches to manage the boundary.
One method is the ‘‘link atom approach’’ [MJ Field, PA Bash, M Karplus: J Comput Chem,
1989, 6, 700], the QM region is ”closed” by an appropriate virtual ligand atom.
Another mathod is ‘‘ frozen orbital approach’’ [G Monard, M Loos, V Thery, K Baka, J-L
Rivail: Int J Quant Chem, 1996, 58, 153]. The continuous electron density at the bundary is
ensured by "frozen" orbitals between quantum and classical atoms (local self-Consistent
field, LSCF).
8 Bioinformatics
Application of QM/MM methodsMechanism of triose-phosphate isomerase (TIM)
Important parts of the active site and the substrate, reactive intermediates, transition states and
product were managed by QM/MM methods in the discovery / correct interpretation of the triose
phosphate isomerase (TIM) enzyme reaction [PA Bash, MJ Field, RC Davenport, GA Petsko, D
Ringe, M Karplus: Biochemistry 1991, 30, 5826–5832; JR Knowles: Phil Trans Roy Soc Lond B
1991, 332, 115–121].
13.11.2013.
9 Bioinformatika 22009. 04. 17.
Binding free-energy calculations
Ligand 1 + Receptor ΔG1 Ligand 1/Receptor
Ligand 2 + Receptor ΔG2 Ligand 2/Receptor
Two independent experiments to determine both the ligand and receptor binding free-energies:
Ligand 1 + Receptor ΔG1 Ligand 1/Receptor
⇓ ⇓ΔG3 ΔG4⇓ ⇓
Ligand 2 + Receptor ΔG2 Ligand 2/Receptor
Two relative ligand binding free-energies determined by using the following cyclic scheme:
where ΔG3 and ΔG4 are the formal difference of the free-energies of chemical
transformations of Ligand 1 -> Ligand 2 in solution and bound to receptor. As ΔΔGcycle = 0,
ΔΔGcycle = ΔG1 + ΔG2 -ΔG3 - ΔG4 = 0
therefore
ΔΔGbinding = ΔG1 - ΔG2 = ΔG3 - ΔG4
Use of the relatív ΔΔGbinding values eliminates the need to determine the real ligand - receptor
ΔG1 és ΔG2 binding free-energies which are quite computation demanding.
10 Bioinformatics
Identification of target proteins
Direct identification of the target proteins is possible only since about a decade
Historically, only a few drugs are known to which the target protein has become known at
the same time as the drug itself. The reason for this is that the development of new drugs
have traditionally been based largely on modifying known of drugs by intuitive use of
molecular similarities. The changes were immediately tested experimentally in vitro and
in vivo. Thus, the effectiveness of the drug was judged even without knowledge of the
target protein. The consequence of this is that the drugs currently on the market act on
members of an approx. 500 may target protein kit [Drews, J.: Die verspielte Zukunft,
1998, Basel: Birkhauser Verlag].
Identification of protein targets represents the bottleneck of today's medical and
pharmaceutical science.
13.11.2013.
11 Bioinformatika 22009. 04. 17.
Target protein identification - genomics
The figure shows a portion of a DNA chip.
This DNA chip shows the difference in proteins produced by yeast cells
in two different states. One of the states (green) in the presence of glucose
represents the "healthy" condition of the cells, the second state (red) in
the absence of glucose represents the "hungry" condition of the cells.
The bright green spots indicate proteins that are expressed in a large
proportion of "healthy" state of cells. The red spots are proteins, which
are mainly formed by the hungry state. When a protein is produced both
states, the spot is yellow (additive mixture of green and red colors).
The dark spots are proteins that are not expressed at high frequency.
Therefore, it can be decided on the basis of the spots’ color in which state
of the cell a protein of question is formed more frequently.
Today, new methods of molecular biology – which only developed a few years ago – provide
fundamentally new opportunities for the identification of target proteins. This development can
be exemplified by DNA chip technology [DeRisi, J. L, Iyer, V. R., Brown, P. O. Science, 1997,
278 (5338) 680-686]. The overall picture is of course also includes a number additional methods
/ options which are under development.
12 Bioinformatics
Very important issue is to know exactly what studies can result in an image. What is also of
great importance to know what amount of information can be assigned to each colored spot
of the image. We can make the following general statements:
1. Coordinates of the colored spots are used for identification of the protein. For simplicity, it
can be assumed that various spots represent different proteins (although this does not always
apply, as multiple spots may be used for eg. calibration purposes). The exact position of
spots have been set prior to DNA chip manufacturing. The DNA chip design requires
identification of a number of proteins and optimization of their layout on the surface of the
chip. The exact location depends on the boundary conditions and the nature of the
experiment, but has no significant importance in terms of interpreting the results.
2. Only partial / basic information can be assigned to the unique spots. In the best case, the
experiment corresponds to the full sequence of the gene or protein. However, in many cases,
it happens that only a short but necessarily relevant part of the sequence is available.
Target protein identification - genomics
13.11.2013.
13 Bioinformatika 22009. 04. 17.
Genomics vs. proteomics
The methods of genomics test the expressed genes that result in the translation of protein but not the
actual proteins. The proteomic methods investigate the effectively formed proteins.
The previous figure shows DNA chip providing information on the expressed genes, therefore giving
only indirect data on the actual protein products. The advantage of genomic approach is that genes
experimentally more accessible and easier to handle than the proteins. As a result, nowadays the genomic
methods are more widespread than proteomics methods. In parallel with the development of
experimental techniques increasing spread of proteomics can be predicted.
It should be realized, however, the disadvantages of genomic approaches. First is that the expression
level of a gene does not necessarily correspond to the appropriate high protein concentrations in cells,
although it's more important if you are interested in relationships of disease extent and the actual protein
expression level.
Perhaps even more important is that a significant portion of the protein is modified after translation
(post-translational modifications). Such alterations are the glycosylation (complex sugar units binding to
protein surface) and phosphorylation (phosphate units binding to the protein). Post-translational
modifications of these proteins with the same primary amino acid sequence can lead to a number of
different versions. Genomics is not able to track these modifications which may be crucial in many
cases.
14 Bioinformatics
Enzyme nomenclature and classification databases:
EXPASY – ENZYME: http://www.expasy.ch/enzyme/
BRENDA: http://www.brenda-enzymes.org/
Enzyme databases
13.11.2013.
http://www.expasy.ch/enzyme/http://www.brenda-enzymes.org/
15 Bioinformatics
Enzyme databases
Enzyme databases:
Databases for various data and nomenclature of enzymes
Deteiled records for every enzyme classes to which EC (Enzyme Commission) EC assigned
an identifier (in format of EC 0.11.22.33)
13.11.2013.
16 Bioinformatics
Search in ENZYME database:
By EC number
By Enzyme class
By description (official name) or alternative name(s)
By chemical compound
By cofactor
By text in comment lines
ENZYME database – search and content
ENZYME: http://www.expasy.ch/enzyme/
13.11.2013.
17 Bioinformatika 22009. 04. 17.
ENZYME: http://www.expasy.ch/enzyme/
ENZYME database
18 Bioinformatika 22009. 04. 17.
ENZYME database
ENZYME: http://www.expasy.ch/enzyme/
19 Bioinformatika 22009. 04. 17.
ENZYME database
ENZYME: http://www.expasy.ch/enzyme/
20 Bioinformatics
BRENDA database – search and content
BRENDA: http://www.brenda-enzymes.org/
Search in BRENDA database (detailed search possibilities):
By nomenclature
By reaction & specificity
By functional parameters
By isolation and preparation
By organism-related information
By stability
By enzyme structure
By disease and related information
By application and egineering aspects
13.11.2013.
21 Bioinformatika 22009. 04. 17.
BRENDA database
BRENDA: http://www.brenda-enzymes.org/
22 Bioinformatika 22009. 04. 17.
BRENDA database
BRENDA: http://www.brenda-enzymes.org/
23 Bioinformatika 22009. 04. 17.
BRENDA database
BRENDA: http://www.brenda-enzymes.org/
24 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
25 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
26 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
27 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
28 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
29 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
30 Bioinformatika 22009. 04. 17.
BRENDA: http://www.brenda-enzymes.org/
BRENDA database
31 2009. 04. 17.
KEGG: http://www.genome.jp/kegg/
Bioinformatics
KEGG databases - PATHWAY
32 2009. 04. 17.
KEGG databases - PATHWAY
KEGG: http://www.genome.jp/kegg/
Bioinformatics
33 Bioinformatika 22009. 04. 17.
KEGG: http://www.genome.jp/kegg/
KEGG databases - PATHWAY
34 2009. 04. 17.
KEGG: http://www.genome.jp/kegg/
KEGG databases - PATHWAY
Bioinformatics
35 Bioinformatika 22009. 04. 17.
KEGG: http://www.genome.jp/kegg/
KEGG databases - PATHWAY