Bioinformatics Proteomics Lecture 8

2010. 03. Bioinformatics - Proteomics

Bioinformatics − Proteomics Lecture 8

Prof. László Poppe

BME Department of Organic

Chemistry and Technology

Bioinformatics – Proteomics

Lecture and practice

2 Bioinformatics - Proteomics2010. 03.

Whole genoms SwissProt PDB

Homology modeling (HM)

Fold recognition (FR)

FR ~ Unknown fold (UF)

Unknown fold (UF)

Increasing redundancy

Structure prediction – origin of sequence


Predictability of structure depends on the origin of the sequence:

Proteins from whole genomes:

~ 10% HM, ~ 40% FR, ~ 50% completely novel structures

SWISS−PROT adatbázis szekvenciái:

~ 33% HM, ~ 33% FR, ~ 33% completely novel structures

PDB is strongly redundant:

~60% of the structures would be predictable by usingalready known structures, only

<10% in novel structure family

Structural genomics:

main goal is to select those proteins from the genom which make (after structure

elucidation with x-ray crystallography, NMR) all the other proteins in the genom

predictable

Structure prediction – origin of sequence


Homolgy modeling:

>25% sequence identity with known structure

Fold recognition:

If the fold can be identified, less than 25% identity may be sufficient for modeling.

Ab initio predictions:

Works for small peptides (<30 amino acids). Knowledge base oriened methods perform

better than purely theoretical methods.

Methods for structure prediction


Mostly -helixMostly -strand

Mixed -helix / -stand Low secondary structure

Overlay of triose−phpsphate isomerases

(TIM) and similar proteins

Fold of a protein family. This sc.

TIM−barrel fold is quite frequent

Threading of proteins (fold)

The fold of a protein family may be similar

even at low sequence identity


Distant homologs (<25% sequence identity) may have the same fold.

Two important aspects are related to the forld recognition:

”Fold libraries”: contains all the known folds in some form.

Comparison method: a method enabling the fit of a sequence over a structure (eg. PSI-

BLAST).

Proteins - fold recognition

By the comparative method, the sequence is compared to all the structures of the „fold library"

to find the fold that can be adopted by the sequence. A general method is called threading: the

protein sequence of unknown structure is „thread" over known structures, then the fit of

sequence and structure is evaluated by certain methods.

Earlier methods were based on environment of the particular amino acid side chains (polarity

of environment, burial, secondary structure, etc.). Recent methods use pair-potentials (potential

functions derived from amino acid-amino acid contacts found in the known protein structures).


Protein fold recognition

Methods for fold recognition of proteins:

Scoring functions based on pairwise pseudo-energy terms:

the potential function depends on the distance of the amino

acid pairs within the sequence as well; the predicted

secondary structure can also be taken into account (Jones

group / Sippl group, ProFIT);

in addition to pair-potentials the conserved hydrophobic

nuclei within known structures can be used (Bryant group)

Solely sequence-based method, called Hidden Markov

model (Karplus group)

Combination of multiple methods and consensus result

(Nishikawa, Koretke groups)

CASP3: results are mixed. Predictions are good

for several proteins, poor in the majority of

case, rather poor in several instances.

Threading methods are not satisfactory yet (no

method is capable predicting >40% of the real

fold)


There are automatic homology modeling / fold recognition methods (SwissModel, 3D

Jigsaw; CPHModels; EsyPred3D; Robetta), but these in themselves lead to much poorer

results than procedures performed iteratively with human intervention.

Further perspectives

Large-scale prediction projects started using automated methods related to the structural

genomics programs.

The MODBASE model databases (ModBase, SwissModel) are continuously expanding.

Simultaneously to experimental structure determination structure of the models may be

checked.

Perspectives of fold recognition


ModBase: http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi

Structure model databases - ModBase








SwissMod: http://swissmodel.expasy.org/repository/

Structure model databases - SwissMod


SwissMod: http://swissmodel.expasy.org/repository/

Structure model databases - SwissMod


Molecular dynamics in proteomics: simulation of the movement of a protein

The potential energy surface

The Born−Oppenheimer approach: because the mass of the nuclei is much larger

movement of electrons is much faster than the nuclei. The molecule is considered as

classical mechanical system (springs connected, charged mass points) during modeling of

molecular motions.

The E(R) potential equation depending on the position of the nuclei can be substituted

by an empirical energy-equation instead of solving the Schrödinger−equation.

Motion of the nuclei can be described by Newton's equation of motion:

−(dE/dR) = m(d2R/dt2)

Conformational energy – Molecular dynamics


Molecular mechanics of proteins – force-field

The force-field is the mathematic form of the empirical energy function. A typical force-field:

The potential energy of a chemical system with defined 3D structure, V(R)total, can split to

internal V(R)internal and external V(R)external potential energy terms according to the equations

above.


The internal terms are related to covalently bonded atoms, the respective external terms are

related to non-covalent and non-bonding interactions between the atoms. The external terms are

also treated as non-bonding and intermolecular parts of the equation system.

Hypothetical molecules to demonstrate energy terms described by the

equations (1)–(3). The molecule A consists of atoms 1-4, and

molecule B is represented by atom 5.

Internal terms within molecule A: bonds (b), between atoms 1-2, 2-3

and 3-4; bond angles (θ), between atoms 1–2–3 and 2–3–4; dihedral

or torsional angles (χ), between atoms 1–2–3–4.

Molecule B takes part in intermolecular interactions involved in the

external terms. It interacts with each of the four atoms of molecule A,

which requires the knowledge of the correct inter-atomic distances

(rij).



Force field parameterization

The internal parameters are derived from experimental data (e.g. crystal

structure, lattice dynamics, X-ray data, density, heat of vaporization,

spectroscopic data, etc.). The results of high-level ab initio quantum

chemical calculations can also be used. Due to the empirical data, the force

field implicitly includes the relativistic and quantum mechanical effects as

well.

Best known force fields for proteins: AMBER, CHARMM, GROMOS,

CVFF, ECEPP



The pair interaction problem

The number of pairs of interaction increases with the square size of the system, this greatly

increases the time needed to evaluate the energy function.

Solution: apply cutoff (bordeline distance); interactions between pairs more distant than cutoff

(10−20 Å) are disregarded (problems resulting from this neglect can be treated well).

Modeling of the solvent

Explicit: water molecules are put around the molecule we test [several layers or a box filled with

waters (periodic boundary condition)]

Implicit: the potential is modified (eg, distance-dependent dielectric constant, two kinds of

dielectric constant, etc).



Molecular mechanics – Minimization of energy

The steepest descent method:

We move towards the derivative of the energy surface. Converges well when away from the

minimum of energy, but badly when close to it.

Conjugated gradients method:

The step direction is corrected using derivatives of the previous steps. This method converges

quickly for large systems and also near the minima.

In case of molecular mechanics methods, the

aim is to find minima on the potential surface

(during molecular dynamics conformations

fluctuate around such minima). The method

is suitable for optimizing structures.


The aim of the molecular dynamics studies is to examine the three-dimensional movements of

the structure, by simulating conformational movements

Increment of the molecular dynamics studies (dt) is the 1-2 femtosecond duration of molecular

vibrations.

Procedure:

Initial velocities are added to the atoms (size and distribution of atomic velocities which are

appropriate to the desired temperature), and then the movement of the system is simulated

based on Newtonian equations of motion (the forces are calculated from the energy function;

accelerations, velocities and the position changes are derived from the forces; multiple

methods)

The simulation can be carried out at a constant temperature (system attached to thermostat) or

at constant pressure.

Molecular dynamics of proteins


Typical motions of proteins


Applicability

Structure refinement: refinement of medels

Structure modifications: eg. mutant protein structure is predictable (the structure

containing mutated side chain can be relaxed by molecular dynamics)

Docking: findin position on mode of ligand or substrate binding within a protein

Study of important functional motions of proteins: eg. relative motions of domains in

enzyme catalysis, etc. (less reliable)

Simulation of folding/unfolding of proteins: stability / mechanism studies (even less

reliable)

Limitations of molecular dynamics

Duration of simulations is limited taking into account the performance of existing

computers. In the case of proteins, at present, the maximum order of simulations is

microseconds (on multi-processor supercomputers). It will also be increased using

modern methods.

The classic mechanical model is not suitable for chemical events (eg. ionization,

protontransfer, etc ) and more subtle interactions or chemical reactions (eg. transient

states) simulation. [QM/MM extensions]

Molecular mechanics of proteins


CHARMM (classical, pioneer of MD symulations)

COSMOS (clasical and hybrid QM/MM, QM atomic charges)

Desmond (classical, parallel up to ~ 1000 CPU)

GROMACS (classical)

GROMOS (classical)

GULP (classical)

MDynaMix (classical, parallel )

MOLDY (classical, parallel )

Materials Studio (many force fields, serial / parallel, (QM+MD), (DFT), etc.)

MOSCITO (classical)

NAMD (classical, parallel up to ~ 1000 CPU)

TINKER (classical)

YASARA (classical)

ORAC (classical)

XMD (classical)

Programs for molecular dynamics

Programs are partially free for use or partially commercial products

http://en.wikipedia.org/wiki/CHARMM

http://www.cosmos-software.de/ce_intro.html

http://www.deshawresearch.com/resources.html

http://en.wikipedia.org/wiki/GROMACS

http://en.wikipedia.org/wiki/GROMOS

http://en.wikipedia.org/w/index.php?title=General_Utility_Lattice_Program&action=edit&redlink=1

http://en.wikipedia.org/wiki/MDynaMix

http://www.ccp5.ac.uk/moldy/moldy.html

http://accelrys.com/products/materials-studio/

http://en.wikipedia.org/wiki/MOSCITO

http://en.wikipedia.org/wiki/NAMD

http://en.wikipedia.org/wiki/TINKER

http://www.yasara.org/

http://www.chim.unifi.it/orac/

http://en.wikipedia.org/wiki/XMD


Molecular docking to proteins

Predicting / testing the binding of a small molecule

(ligand, substrate and coenzyme, etc.) inside/ on the

surface of a protein (receptor).

Prediction / test of the binding of two proteins to

each other

Predictions / analysis of the binding of protein to

DNA


Way to assess the fit:

1. Consideration of simple geometric fit

2. Evaluation of the fit: a complex energy function, electrostatic complementarity, etc..

According to the model:

1.Both molecules are rigid

2.One molecule (usually a ligand) is flexible, and the other (usually protein) is rigid

3.Both are flexible (the search is very time-consuming)

According to the algorithm:

1.Moleculardynamics

2.Monte Carlo methods (generaterandom positions)

3.Simulated annealing: simulation of a slow cooling of a high temperature system, it helps to

achieve the minimum energy

4.Othermethods

Programs: Argus Lab, DOCK, AutoDOCK, FTDOCK, GRAMM (on the web)



Successfulness

Small molecule − protein docking: usually good results. In case of more

complicated cases (e.g. large protein, large and flexible substrate) good results can

only be achieved in combination with some knowlegde of experimental results

Protein − protein docking: ambigous, poor results

Protein − DNA docking: ambigous, poor results


Documents

Bioinformatics Proteomics Lecture 8