Download pdf - Bioinformatics Proteomics Lecture 9Bioinformatika 2 Way to assess the fit: 1. Consideration of simple geometric fit 2. Evaluation of the fit: a complex energy function, electrostatic

13.11.2013. Bioinformatics - Proteomics

Bioinformatics − Proteomics Lecture 9

Prof. László Poppe

BME Department of Organic Chemistryand Technology

Bioinformatics – Proteomics

Lecture and practice

2 Bioinformatika 22009. 04. 17.

Map of bioinformatics A medicinal chemistry point of view

3 Bioinformatics13.11.2013.

Drug-receptor interaction modeling

The drug-receptor interaction theoretical

modeling of the flow chart and its

relationship with the experiments.

Theoretical studies are in rectangles, ellipses

represent experimental data. The gray area is

the modeling of small molecules, the white

areas are related to bioinformatics.

Advanced quantum

mechanical methods

(eg. QM / MM) can be

applied

4 Bioinformatics2009. 04. 17.

Farmacophore model constructionD1 dopamin receptor

Allowed

pharmacophore

regions

Forbidden

pharmacophore

regions

Pharmacophore model: a spatial

alignement of ligands. Models a 3D image

being complementary to the active site.

Bioinformatika 22009. 04. 17.

Way to assess the fit:

1. Consideration of simple geometric fit

2. Evaluation of the fit: a complex energy function, electrostatic complementarity, etc..

According to the model:

1. Both molecules are rigid

2. One molecule (usually a ligand) is flexible, and the other (usually protein) is rigid

3. Both are flexible (the search is very time-consuming)

According to the algorithm:

1. Molecular dynamics

2. Monte Carlo methods (generate random positions)

3. Simulated annealing: simulation of a slow cooling of a high temperature system, it helps to achieve the minimum energy

4. Other methods5

Docking molecules to proteins

Predicting / testing the binding of a small molecule

(ligand, substrate and coenzyme, etc.) inside / on the

surface of a protein (receptor).

Prediction / test of the binding of two proteins to each

other

Predictions / analysis of the binding of protein to DNA

6 Bioinformatics13.11.2013.

QM/MM methods

MM

Region

Treatment of proteins in several regions.

Important part of the active site and the substrate

(or product, reactive intermediate, transition

state) contains a more accurate calculation for

QM region.

The QM region is applicable for analyzing

electronic / quantum interactions (semi-empirical

and HF / DFT methods).

The rest of the protein is treated with classical

MM force-field.


QM/MM methodsThe boundary of classical and quantum regions

Splitting of a glutamate side chain to quantum

and classical regions.

The terminal CH2CO2 group is treated with

quantum mechanics, and a molecular mechanics

force-field is applied to the main chain atoms.

Determination of the cutting surface is the most

difficult question (usually along the C(sp3)-

C(sp3) bond).

There are two main approaches to manage the boundary.

One method is the ‘‘link atom approach’’ [MJ Field, PA Bash, M Karplus: J Comput Chem,

1989, 6, 700], the QM region is ”closed” by an appropriate virtual ligand atom.

Another mathod is ‘‘ frozen orbital approach’’ [G Monard, M Loos, V Thery, K Baka, J-L

Rivail: Int J Quant Chem, 1996, 58, 153]. The continuous electron density at the bundary is

ensured by "frozen" orbitals between quantum and classical atoms (local self-Consistent

field, LSCF).

8 Bioinformatics

Application of QM/MM methodsMechanism of triose-phosphate isomerase (TIM)

Important parts of the active site and the substrate, reactive intermediates, transition states and

product were managed by QM/MM methods in the discovery / correct interpretation of the triose

phosphate isomerase (TIM) enzyme reaction [PA Bash, MJ Field, RC Davenport, GA Petsko, D

Ringe, M Karplus: Biochemistry 1991, 30, 5826–5832; JR Knowles: Phil Trans Roy Soc Lond B

1991, 332, 115–121].

13.11.2013.


Binding free-energy calculations

Ligand 1 + Receptor ΔG1 Ligand 1/Receptor


Two independent experiments to determine both the ligand and receptor binding free-energies:


⇓ ⇓ΔG3 ΔG4⇓ ⇓


Two relative ligand binding free-energies determined by using the following cyclic scheme:

where ΔG3 and ΔG4 are the formal difference of the free-energies of chemical

transformations of Ligand 1 -> Ligand 2 in solution and bound to receptor. As ΔΔGcycle = 0,

ΔΔGcycle = ΔG1 + ΔG2 -ΔG3 - ΔG4 = 0

therefore

ΔΔGbinding = ΔG1 - ΔG2 = ΔG3 - ΔG4

Use of the relatív ΔΔGbinding values eliminates the need to determine the real ligand - receptor

ΔG1 és ΔG2 binding free-energies which are quite computation demanding.

10 Bioinformatics

Identification of target proteins

Direct identification of the target proteins is possible only since about a decade

Historically, only a few drugs are known to which the target protein has become known at

the same time as the drug itself. The reason for this is that the development of new drugs

have traditionally been based largely on modifying known of drugs by intuitive use of

molecular similarities. The changes were immediately tested experimentally in vitro and

in vivo. Thus, the effectiveness of the drug was judged even without knowledge of the

target protein. The consequence of this is that the drugs currently on the market act on

members of an approx. 500 may target protein kit [Drews, J.: Die verspielte Zukunft,

1998, Basel: Birkhauser Verlag].

Identification of protein targets represents the bottleneck of today's medical and

pharmaceutical science.

13.11.2013.


Target protein identification - genomics

The figure shows a portion of a DNA chip.

This DNA chip shows the difference in proteins produced by yeast cells

in two different states. One of the states (green) in the presence of glucose

represents the "healthy" condition of the cells, the second state (red) in

the absence of glucose represents the "hungry" condition of the cells.

The bright green spots indicate proteins that are expressed in a large

proportion of "healthy" state of cells. The red spots are proteins, which

are mainly formed by the hungry state. When a protein is produced both

states, the spot is yellow (additive mixture of green and red colors).

The dark spots are proteins that are not expressed at high frequency.

Therefore, it can be decided on the basis of the spots’ color in which state

of the cell a protein of question is formed more frequently.

Today, new methods of molecular biology – which only developed a few years ago – provide

fundamentally new opportunities for the identification of target proteins. This development can

be exemplified by DNA chip technology [DeRisi, J. L, Iyer, V. R., Brown, P. O. Science, 1997,

278 (5338) 680-686]. The overall picture is of course also includes a number additional methods

/ options which are under development.

12 Bioinformatics

Very important issue is to know exactly what studies can result in an image. What is also of

great importance to know what amount of information can be assigned to each colored spot

of the image. We can make the following general statements:

1. Coordinates of the colored spots are used for identification of the protein. For simplicity, it

can be assumed that various spots represent different proteins (although this does not always

apply, as multiple spots may be used for eg. calibration purposes). The exact position of

spots have been set prior to DNA chip manufacturing. The DNA chip design requires

identification of a number of proteins and optimization of their layout on the surface of the

chip. The exact location depends on the boundary conditions and the nature of the

experiment, but has no significant importance in terms of interpreting the results.

2. Only partial / basic information can be assigned to the unique spots. In the best case, the

experiment corresponds to the full sequence of the gene or protein. However, in many cases,

it happens that only a short but necessarily relevant part of the sequence is available.

Target protein identification - genomics

13.11.2013.


Genomics vs. proteomics

The methods of genomics test the expressed genes that result in the translation of protein but not the

actual proteins. The proteomic methods investigate the effectively formed proteins.

The previous figure shows DNA chip providing information on the expressed genes, therefore giving

only indirect data on the actual protein products. The advantage of genomic approach is that genes

experimentally more accessible and easier to handle than the proteins. As a result, nowadays the genomic

methods are more widespread than proteomics methods. In parallel with the development of

experimental techniques increasing spread of proteomics can be predicted.

It should be realized, however, the disadvantages of genomic approaches. First is that the expression

level of a gene does not necessarily correspond to the appropriate high protein concentrations in cells,

although it's more important if you are interested in relationships of disease extent and the actual protein

expression level.

Perhaps even more important is that a significant portion of the protein is modified after translation

(post-translational modifications). Such alterations are the glycosylation (complex sugar units binding to

protein surface) and phosphorylation (phosphate units binding to the protein). Post-translational

modifications of these proteins with the same primary amino acid sequence can lead to a number of

different versions. Genomics is not able to track these modifications which may be crucial in many

cases.

14 Bioinformatics

Enzyme nomenclature and classification databases:

EXPASY – ENZYME: http://www.expasy.ch/enzyme/

BRENDA: http://www.brenda-enzymes.org/

Enzyme databases

13.11.2013.

http://www.expasy.ch/enzyme/http://www.brenda-enzymes.org/

15 Bioinformatics

Enzyme databases

Enzyme databases:

Databases for various data and nomenclature of enzymes

Deteiled records for every enzyme classes to which EC (Enzyme Commission) EC assigned

an identifier (in format of EC 0.11.22.33)

13.11.2013.

16 Bioinformatics

Search in ENZYME database:

By EC number

By Enzyme class

By description (official name) or alternative name(s)

By chemical compound

By cofactor

By text in comment lines

ENZYME database – search and content

ENZYME: http://www.expasy.ch/enzyme/

13.11.2013.



ENZYME database

20 Bioinformatics

BRENDA database – search and content


Search in BRENDA database (detailed search possibilities):

By nomenclature

By reaction & specificity

By functional parameters

By isolation and preparation

By organism-related information

By stability

By enzyme structure

By disease and related information

By application and egineering aspects

13.11.2013.


BRENDA database


31 2009. 04. 17.

KEGG: http://www.genome.jp/kegg/

Bioinformatics

KEGG databases - PATHWAY

32 2009. 04. 17.



Bioinformatics

34 2009. 04. 17.



Bioinformatics