Upload
patricia-gallagher
View
215
Download
0
Embed Size (px)
Citation preview
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Virtual Screening at the post-Virtual Screening at the post-genomic eragenomic era
Dr. Didier ROGNAN
Bioinformatic Group
UMR CNRS 7081
Illkirch, France
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Virtual screening: DefinitionVirtual screening: Definition
Searching electronic databases (2D, 3D) for molecules fitting:
a pharmacophore
an active site
Walters et al. Drug Discovery Today 1998, 3, 160-178Schneider et al., Drug Discovery Today 2002, 7, 64-70.
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Sci Scientific reasonsntific reasons1. Increasing number of interesting macromolecular targets (500 10,000)2. Increasing number of protein 3-D structures (X-ray, NMR)3. Better knowledge of protein-ligand interactions4. Dévelopement of chem- and bio-informatic methods5. Increasing computing facilities
Economic reasons
1. High cost of high-througput screening (HTS): 0.2 € /molecule
2. Increase the ratio
ions Applications1. Identifying the very first ligands of orphan targets2. Identifying/optimizing new chemical scaffolds
Importance of virtual screeningImportance of virtual screening
# of active molecules (hits)
# of tested molecules
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Protein-based virtual screening Protein-based virtual screening
2. Evaluation
« Scoring »
Mol # Gbind
11121 -44.51 222 -42.21 3563 -41.50 6578 -40.31 25639 -40.28. .....100000 22.54
Database (3-D)
1. Orientation « docking »
Target-Ligand Complex
Target(3D !!)
Hit list
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Docking Docking
GoalQuickly find (1-2 min./molécule)
the orientation of the ligand in the active site the protein-bound conformation MéthodsOrientationSurface complementarityComplementarity of intermolecular interactions
Conformational freedomIncremental constructionConformational sampling (MC, GA, SA)
Abagyan et al. Curr. Opin. Struct. Biol. 2001, 5, 375-382
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Docking :Docking : OrientationOrientation
Surface-based orientation (e.g. DOCK)
2. Molecular surface (active site)
3. Filling the surface by overlapping spheres
4. Matching sphere centerswith atoms
1. 3D structure
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
http://cartan.gmd.de/flexx
Docking :Docking : OrientationOrientation
interactions-based orientation (e.g. FlexX)
-Statistical rules for locating ligand atoms
-Overall placement of a base fragment by triangulation
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Docking: Ligand flexibilityDocking: Ligand flexibility
- by preselecting several conformers/molecules
- by incremental construction
Termination adding the 2nd adding the 1st peripheral fragment peripheral fragment
Reading preferred torsion valuesSelecting the « best »
Ligand Fragment decomposition base fragment
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
- by a genetic algorithm (e.g. Gold)
http://www.ccdc.cam.ac.uk/prods/gold/
Initial population
Selection of parents
Genetic operators
Selection of children
New population
Convergence test
size
Parent ScoreA 2.5B 5.0C 1.5D 1.0
B
A CD
Survival rate
100110010010010011
100110011010011010
100110010
100101010
gene:
x,y,z coords.tors. anglesorientation…
crossing over
mutation
New
genera
tion
crossing over rate
mutation rate
# o
f evolu
tion
s
Chromosome = Ligand (orientation, conformation)
Docking: Ligand flexibilityDocking: Ligand flexibility
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Docking AccuracyDocking Accuracy
Analysing 100 high-resolution PDB complexes Paul,N. and Rognan, D. Proteins, in press
0 2 4 6 8 10 12 140
10
20
30
40
50
60
70
80
90
100
Accuracy of the best possible pose (n =30)
% o
f com
plex
es
rmsd, Å
Dock FlexX Gold ConsDock
Finding a reliable pose out of a set of 30-50 solutions is feasible !
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Docking AccuracyDocking Accuracy
0 2 4 6 8 10 12 140
10
20
30
40
50
60
70
80
90
100
Accuracy of the top-ranked pose
% o
f com
plex
es
rmsd, Å
Dock FlexX Gold ConsDock
Analysing 100 high-resolution PDB complexes Paul,N. and Rognan, D. Proteins, in press
Ranking the most reliable solution at the top of the list is still an issue !
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Source of Docking ErrorsSource of Docking Errors
Nature of the active site (flat vs. cavity)
Missed influence of waterLigand flexibilityInaccuracy of the scoring functionUnusual binding mode/interactions
Inadequate set of protein coordinatesWrong atom typing
Impossible
Difficult
Easy
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
ScoringScoring
Thermodynamic Methods: FEP, TI (2)
Force-fields (10-100)
QSAR, 3D-QSAR (100-1,000)
Empirical scoring functions (>100,000)
# of molecules
Err
or,
kJ/
mol
Accu
racy
2 1000 100,000
2
10
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
ScoringScoring
First-principle methods:sum of physically meaningfull terms
Regression-based free energy approximations:sum of regression-weighted terms
Potential of mean forcesdistance-dependent atom pair-weighted Helmotz free energies
Gohlke et al. Curr. Opin. Struct. Biol. 2001, 11,231-235
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Empirical Scoring functionEmpirical Scoring function
Constant
H-bond term
g1( r) =
0
0.25)/0.4-r(1
1
Å 0.65 r if
Å 0.65 r Å 0.25 if
0.25År if
g2( ) =
0
30)/50-α(1
1
º80 α if
80º α 30º if
º03α if
f(r) =
0
R1)/3.-r(1
1
R2 r if
R2r R1 if
R1r if
lipophilic term
buried-polar repulsive term
rotational term
0
,,,0 )()()()(2)(1 reacdesolvrotrot
LppL
PllPbp
LllLlipo
hbhbbinding GGHGrfrfGrfGgrgGGG
desolvation term
FresnoRognan et al. (1999) J. Med. Chem., 42, 4650-4658.
Hrot = 1 + (1-1/Nrot) r
(Pp(r) + P’p(r))/2
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Scoring AccuracyScoring Accuracy
Current accuracy: 5-10 kJ/mol (1-2 pK unit)
Weak point of all docking programs
Entropic contributions are difficult to handle ! !
Way-around: use of consensus scoring functions
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
S
S
Br
O
O
NH
H
O
Isis/Base
C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C(:C:C:@9)Br)C[16]:C:C:C:C:C@16)N
2-D Fingerprint
Full database
FilteringChemical reactivtypharmacokinéticsDrug-likeness
C[1](=C(C(=CS@1(=O)=O)SC[9]:C:C:C(:C:C:@9)Br)C[16]:C:C:C:C:C@16)N
Filtered database
2D 3D
HydrogensIonisation
3-D Database
Library set-upLibrary set-up
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
ApplicationsApplications
High-resolution X-ray structures (enzymes)
Target Ligands Base Hit ReferenceRate
CD4-gp120 inhibitors 150,000 9.7 % Li et al., PNAS (1997)
gp41 inhibitors 20,000 12.5 % Debnath et al., J. Med. Chem. (1999)
FT inhibitors 219,000 19.0 % Perola et al., J. Med. Chem (2000)
kinesin inhibitors 20,000 12.5 % Hopkins et al., Biochemistry (2000)
HIV1 Tar-Tat inhibitors 153,000 25.0 % Filikov et al., JCAMD (2000)
gp41 inhibitors 20,000 12.5 % Debnath et al., J. Med. Chem
Bcl-2 inhibitors 207,000 20.0 % Enyedi et al., J. Med. Chem (2001)
HCA-II inhibitors 90,000 61.0 % Grüneberg et al., Angew. (2001)
RAR agonists 250,000 6.6 % Shapira et al., BMC Struct. Biol. (2001)
TPI inhibiteurs 108,000 20.0 % Joubert et al., Proteins (2001)
ER antagonists 1,500,000 72.0 % Shapira et al. IBM Sys. J. (2001)
FT: farnesyltransférase, HCA: human carbonic anhydrase, RAR: retonic acid receptor, ER:Estrogen receptor, TPI: triosephosphate isomerase, PEP: phosphoenolpyruvate
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Conclusions
What is possible ?What is possible ?
Discriminate true hits from random ligands Enriching a reduced library by a factor 20 Retrieving about 50% of all true hits Prioritizing ligands for synthesis and experimental screening Using virtual screening for lead finding
What remains to improve ?What remains to improve ?
Predicting the exact orientation Predicting the absolute binding free energy Discriminating true hits from “similar inactives“ Catching all hits Using virtual screening for lead optimization Throughput (100K mols/day 1M/day ?) Pre and post-processing of vHTS
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
Virtual screening at the genomic scaleVirtual screening at the genomic scale
Primary Sequence
3-D Model
virtual Hits
True Hits
SélectivityAffinityADME/Tox
GPCR-Gen
vHTS
Validation
Available analoguesFocussed Libraries
vs. Enzymes (PDB library)vs. RCPGs (RCPG library)
e-Libraries “Bioinfo” (350,000)
“RCPG” ( 30,000)
“Endo” ( 2,000)
Optimisation
RCPGs of the human genome
CENTRE NATIONALDE LA RECHERCHESCIENTIFIQUE
1012 molecules virtual Library
109
107
107 (108 conformations)
105 (106 conformations)
104
103
100
ADME/Tox
Similarité 2-D
Conformations 3-D
Similarity 3-D
Docking
Scoring
expt. Validation True hits
Virtual screening: TomorrowVirtual screening: Tomorrow