Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
2010. 03. Bioinformatics - Proteomics
Bioinformatics − Proteomics Lecture 8
Prof. László Poppe
BME Department of Organic
Chemistry and Technology
Bioinformatics – Proteomics
Lecture and practice
2 Bioinformatics - Proteomics2010. 03.
Whole genoms SwissProt PDB
Homology modeling (HM)
Fold recognition (FR)
FR ~ Unknown fold (UF)
Unknown fold (UF)
Increasing redundancy
Structure prediction – origin of sequence
3 Bioinformatics - Proteomics2010. 03.
Predictability of structure depends on the origin of the sequence:
Proteins from whole genomes:
~ 10% HM, ~ 40% FR, ~ 50% completely novel structures
SWISS−PROT adatbázis szekvenciái:
~ 33% HM, ~ 33% FR, ~ 33% completely novel structures
PDB is strongly redundant:
~60% of the structures would be predictable by usingalready known structures, only
<10% in novel structure family
Structural genomics:
main goal is to select those proteins from the genom which make (after structure
elucidation with x-ray crystallography, NMR) all the other proteins in the genom
predictable
Structure prediction – origin of sequence
4 Bioinformatics - Proteomics2010. 03.
Homolgy modeling:
>25% sequence identity with known structure
Fold recognition:
If the fold can be identified, less than 25% identity may be sufficient for modeling.
Ab initio predictions:
Works for small peptides (<30 amino acids). Knowledge base oriened methods perform
better than purely theoretical methods.
Methods for structure prediction
5 Bioinformatics - Proteomics2010. 03.
Mostly -helixMostly -strand
Mixed -helix / -stand Low secondary structure
Overlay of triose−phpsphate isomerases
(TIM) and similar proteins
Fold of a protein family. This sc.
TIM−barrel fold is quite frequent
Threading of proteins (fold)
The fold of a protein family may be similar
even at low sequence identity
6 Bioinformatics - Proteomics2010. 03.
Distant homologs (<25% sequence identity) may have the same fold.
Two important aspects are related to the forld recognition:
”Fold libraries”: contains all the known folds in some form.
Comparison method: a method enabling the fit of a sequence over a structure (eg. PSI-
BLAST).
Proteins - fold recognition
By the comparative method, the sequence is compared to all the structures of the „fold library"
to find the fold that can be adopted by the sequence. A general method is called threading: the
protein sequence of unknown structure is „thread" over known structures, then the fit of
sequence and structure is evaluated by certain methods.
Earlier methods were based on environment of the particular amino acid side chains (polarity
of environment, burial, secondary structure, etc.). Recent methods use pair-potentials (potential
functions derived from amino acid-amino acid contacts found in the known protein structures).
7 Bioinformatics - Proteomics2010. 03.
Protein fold recognition
Methods for fold recognition of proteins:
Scoring functions based on pairwise pseudo-energy terms:
the potential function depends on the distance of the amino
acid pairs within the sequence as well; the predicted
secondary structure can also be taken into account (Jones
group / Sippl group, ProFIT);
in addition to pair-potentials the conserved hydrophobic
nuclei within known structures can be used (Bryant group)
Solely sequence-based method, called Hidden Markov
model (Karplus group)
Combination of multiple methods and consensus result
(Nishikawa, Koretke groups)
CASP3: results are mixed. Predictions are good
for several proteins, poor in the majority of
case, rather poor in several instances.
Threading methods are not satisfactory yet (no
method is capable predicting >40% of the real
fold)
8 Bioinformatics - Proteomics2010. 03.
There are automatic homology modeling / fold recognition methods (SwissModel, 3D
Jigsaw; CPHModels; EsyPred3D; Robetta), but these in themselves lead to much poorer
results than procedures performed iteratively with human intervention.
Further perspectives
Large-scale prediction projects started using automated methods related to the structural
genomics programs.
The MODBASE model databases (ModBase, SwissModel) are continuously expanding.
Simultaneously to experimental structure determination structure of the models may be
checked.
Perspectives of fold recognition
9 Bioinformatics - Proteomics2010. 03.
ModBase: http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi
Structure model databases - ModBase
10 Bioinformatics - Proteomics2010. 03.
ModBase: http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi
Structure model databases - ModBase
11 Bioinformatics - Proteomics2010. 03.
ModBase: http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi
Structure model databases - ModBase
12 Bioinformatics - Proteomics2010. 03.
SwissMod: http://swissmodel.expasy.org/repository/
Structure model databases - SwissMod
13 Bioinformatics - Proteomics2010. 03.
SwissMod: http://swissmodel.expasy.org/repository/
Structure model databases - SwissMod
14 Bioinformatics - Proteomics2010. 03.
Molecular dynamics in proteomics: simulation of the movement of a protein
The potential energy surface
The Born−Oppenheimer approach: because the mass of the nuclei is much larger
movement of electrons is much faster than the nuclei. The molecule is considered as
classical mechanical system (springs connected, charged mass points) during modeling of
molecular motions.
The E(R) potential equation depending on the position of the nuclei can be substituted
by an empirical energy-equation instead of solving the Schrödinger−equation.
Motion of the nuclei can be described by Newton's equation of motion:
−(dE/dR) = m(d2R/dt2)
Conformational energy – Molecular dynamics
15 Bioinformatics - Proteomics2010. 03.
Molecular mechanics of proteins – force-field
The force-field is the mathematic form of the empirical energy function. A typical force-field:
The potential energy of a chemical system with defined 3D structure, V(R)total, can split to
internal V(R)internal and external V(R)external potential energy terms according to the equations
above.
16 Bioinformatics - Proteomics2010. 03.
The internal terms are related to covalently bonded atoms, the respective external terms are
related to non-covalent and non-bonding interactions between the atoms. The external terms are
also treated as non-bonding and intermolecular parts of the equation system.
Hypothetical molecules to demonstrate energy terms described by the
equations (1)–(3). The molecule A consists of atoms 1-4, and
molecule B is represented by atom 5.
Internal terms within molecule A: bonds (b), between atoms 1-2, 2-3
and 3-4; bond angles (θ), between atoms 1–2–3 and 2–3–4; dihedral
or torsional angles (χ), between atoms 1–2–3–4.
Molecule B takes part in intermolecular interactions involved in the
external terms. It interacts with each of the four atoms of molecule A,
which requires the knowledge of the correct inter-atomic distances
(rij).
Molecular mechanics of proteins – force-field
17 Bioinformatics - Proteomics2010. 03.
Force field parameterization
The internal parameters are derived from experimental data (e.g. crystal
structure, lattice dynamics, X-ray data, density, heat of vaporization,
spectroscopic data, etc.). The results of high-level ab initio quantum
chemical calculations can also be used. Due to the empirical data, the force
field implicitly includes the relativistic and quantum mechanical effects as
well.
Best known force fields for proteins: AMBER, CHARMM, GROMOS,
CVFF, ECEPP
Molecular mechanics of proteins – force-field
18 Bioinformatics - Proteomics2010. 03.
The pair interaction problem
The number of pairs of interaction increases with the square size of the system, this greatly
increases the time needed to evaluate the energy function.
Solution: apply cutoff (bordeline distance); interactions between pairs more distant than cutoff
(10−20 Å) are disregarded (problems resulting from this neglect can be treated well).
Modeling of the solvent
Explicit: water molecules are put around the molecule we test [several layers or a box filled with
waters (periodic boundary condition)]
Implicit: the potential is modified (eg, distance-dependent dielectric constant, two kinds of
dielectric constant, etc).
Molecular mechanics of proteins – force-field
19 Bioinformatics - Proteomics2010. 03.
Molecular mechanics – Minimization of energy
The steepest descent method:
We move towards the derivative of the energy surface. Converges well when away from the
minimum of energy, but badly when close to it.
Conjugated gradients method:
The step direction is corrected using derivatives of the previous steps. This method converges
quickly for large systems and also near the minima.
In case of molecular mechanics methods, the
aim is to find minima on the potential surface
(during molecular dynamics conformations
fluctuate around such minima). The method
is suitable for optimizing structures.
20 Bioinformatics - Proteomics2010. 03.
The aim of the molecular dynamics studies is to examine the three-dimensional movements of
the structure, by simulating conformational movements
Increment of the molecular dynamics studies (dt) is the 1-2 femtosecond duration of molecular
vibrations.
Procedure:
Initial velocities are added to the atoms (size and distribution of atomic velocities which are
appropriate to the desired temperature), and then the movement of the system is simulated
based on Newtonian equations of motion (the forces are calculated from the energy function;
accelerations, velocities and the position changes are derived from the forces; multiple
methods)
The simulation can be carried out at a constant temperature (system attached to thermostat) or
at constant pressure.
Molecular dynamics of proteins
22 Bioinformatics - Proteomics2010. 03.
Applicability
Structure refinement: refinement of medels
Structure modifications: eg. mutant protein structure is predictable (the structure
containing mutated side chain can be relaxed by molecular dynamics)
Docking: findin position on mode of ligand or substrate binding within a protein
Study of important functional motions of proteins: eg. relative motions of domains in
enzyme catalysis, etc. (less reliable)
Simulation of folding/unfolding of proteins: stability / mechanism studies (even less
reliable)
Limitations of molecular dynamics
Duration of simulations is limited taking into account the performance of existing
computers. In the case of proteins, at present, the maximum order of simulations is
microseconds (on multi-processor supercomputers). It will also be increased using
modern methods.
The classic mechanical model is not suitable for chemical events (eg. ionization,
protontransfer, etc ) and more subtle interactions or chemical reactions (eg. transient
states) simulation. [QM/MM extensions]
Molecular mechanics of proteins
23 Bioinformatics - Proteomics2010. 03.
CHARMM (classical, pioneer of MD symulations)
COSMOS (clasical and hybrid QM/MM, QM atomic charges)
Desmond (classical, parallel up to ~ 1000 CPU)
GROMACS (classical)
GROMOS (classical)
GULP (classical)
MDynaMix (classical, parallel )
MOLDY (classical, parallel )
Materials Studio (many force fields, serial / parallel, (QM+MD), (DFT), etc.)
MOSCITO (classical)
NAMD (classical, parallel up to ~ 1000 CPU)
TINKER (classical)
YASARA (classical)
ORAC (classical)
XMD (classical)
Programs for molecular dynamics
Programs are partially free for use or partially commercial products
24 Bioinformatics - Proteomics2010. 03.
Molecular docking to proteins
Predicting / testing the binding of a small molecule
(ligand, substrate and coenzyme, etc.) inside/ on the
surface of a protein (receptor).
Prediction / test of the binding of two proteins to
each other
Predictions / analysis of the binding of protein to
DNA
25 Bioinformatics - Proteomics2010. 03.
Way to assess the fit:
1. Consideration of simple geometric fit
2. Evaluation of the fit: a complex energy function, electrostatic complementarity, etc..
According to the model:
1.Both molecules are rigid
2.One molecule (usually a ligand) is flexible, and the other (usually protein) is rigid
3.Both are flexible (the search is very time-consuming)
According to the algorithm:
1.Moleculardynamics
2.Monte Carlo methods (generaterandom positions)
3.Simulated annealing: simulation of a slow cooling of a high temperature system, it helps to
achieve the minimum energy
4.Othermethods
Programs: Argus Lab, DOCK, AutoDOCK, FTDOCK, GRAMM (on the web)
Molecular docking to proteins
26 Bioinformatics - Proteomics2010. 03.
Successfulness
Small molecule − protein docking: usually good results. In case of more
complicated cases (e.g. large protein, large and flexible substrate) good results can
only be achieved in combination with some knowlegde of experimental results
Protein − protein docking: ambigous, poor results
Protein − DNA docking: ambigous, poor results
Molecular docking to proteins