Upload
aida
View
49
Download
0
Embed Size (px)
DESCRIPTION
Developing & Benchmarking Large-scale Docking (LSD) Pipeline. Niu Huang, 02/17/2004. LSD pipeline. Model Building (ModBase/PDB). Binding Site Refinement (PLOP/Modeller). LigBase. Post-docking Refinement (PLOP). Ligand Docking (DOCK3.5.5.4). Central Database System. Where are we now?. - PowerPoint PPT Presentation
Citation preview
Developing & Benchmarking Large-scale Docking (LSD) Pipeline
Niu Huang, 02/17/2004
LSD pipeline
Binding Site Refinement
(PLOP/Modeller)LigBase
Model Building
(ModBase/PDB)
Ligand Docking
(DOCK3.5.5.4)
Post-docking Refinement
(PLOP)
Central Database System
Where are we now?
Applications(CK – enolase,
Jenni – malaria related,Chris – Sporalation …)
LSD Modules(Testing & Debugging,
Benchmarking)
Investigation in details(Enrichment,
Binding properties,Performance …)
Target Protein
SPHGEN
DATABASE
GRID
SCORING
Docking pipeline
Energy component histograms for docked AR ligands
0
20
40
60
80
100
120
140
-120 -100 -80 -60 -40 -20 0 20 40
Energy (kcal/mol)
fre
qu
en
cy
ele. Ene.
vdw. Ene.
Total Ene.
DATA ANALYSIS
Test case (from J. Med. Chem., McGovern & Shoichet, 2003)
N
NH
+N
N
HN2
NH2
N
CH3
NH
O COO-
COO-
SNH
NO
O
O
NH
O
O
H
(CH2)3
NH NH2
NH2
+
NH
NHN2
O NH
O COO-
COO-
O H
NH
N
O
NH2
N
CH
O
NH
COO-
O
NH
COO-
COO-3
NH
NH2
+
NH
N
O
OH
OHCH2OH
NH
N
NH2OO CH3
N
NN
N
NH2
O
OH
OCH3
S NO
O
CF3
CH3
NCl
NH2
CH2CH3
DHFR
GART
TS
Thrombin
PNP SAHH AChE AR PARP
Expert vs automated docking
Enrichment plots comparing the performance of an expert (dark blue), automated procedure (magenta, referred to Test10), and random enrichment (black).
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30
% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20 30% of database
0
20
40
60
80
100
0 10 20% of database
AChe AR DHFR
GART PARP PNP
SAHH
Thrombin TS
% o
f lig
an
ds f
ou
nd
Approach to “expert docking” limit?
Enzyme/
target
% of db to find 25% of known ligands
Susan (expert docking) John (auto) Niu (best) Niu (Test10)
AchE 6.3 6.5 4.8 5.0
Adometc 11 N/A 0.65 0.72
AR 2.8 6.4 2.9 3.5
DHFR 2.0 7.9 0.3 0.3
GART 0.4 6.9 0.95 0.95
L99A N/A 0.5 0.05 0.05
PARP 2.9 3.9 2.8 4.6
PNP 2.8 3.1 1.25 1.25
SAHH 1.1 1.8 2.1 2.1
Thrombin 6.6 9.5 4.2 4.2
TS 3.5 9.4 1.55 1.55
Missing atoms
Case analysis (DHFR)
DHFR cont. 1DHFR Docking parameters Enrichment plots Docking statistics
(CPU 2.4 GHZ)
Test1 INDOCK.3; .useligsph = on, 70 matching spheres
% of db to find 25% of known ligands = 31;
Max. enrichment factor < 5
~ 300 hrs, 125580 cmpds scored
Test? INDOCK.3; .useligsph = on, 50 matching spheres
% of db to find 25% of known ligands = 9.9; Max. enrichment factor = 8.2
~ 31 hrs, 126545 cmpds scored
Test? INDOCK.2; .useligsph = on, 50 matching spheres
% of db to find 25% of known ligands = 47;
Max. enrichment factor < 5
~ 16 hrs, 122876 cmpds scored
Test9 INDOCK.3;.usefragsph = on, 50 matching spheres
% of db to find 25% of known ligands = 4.3; Max. enrichment factor = 9.2
~ 50 hrs, 109638 cmpds scored
Test10 INDOCK.3;.usefragsph = on, 35 matching spheres
*without cofactor
**without HOH
***without HIP28
(*) from Susan.
~ 15 hrs, 85288 cmpds scored
(~ 7.5 hrs, 70541 cmpds from Susan);
Test11 INDOCK.1;.usefragsph = on, 35 matching spheres
% of db to find 25% of known ligands = 3.5; Max. enrichment factor = 76
~ 2 hrs, 55149 compds scored
0.3 3.7*2.9** 0.3***(2.0)
110 64*43**128***(29)
DHFR cont. 2dhfr_ligands_test1
0
1
2
3
4
5
6
-50 -45 -40 -35 -30 -25 -20 -15 -10
total interaction energy
Fre
qu
ency
dhfr_mddr_test1
0
200
400
600
800
1000
1200
1400
-50.0 -40.0 -30.0 -20.0 -10.0
total interaction energyF
req
uen
cy
dhfr_mddr_test10
0
100
200
300
400
500
600
700
800
900
-50 -40 -30 -20 -10
total interaction energy
Fre
qu
ency
dhfr_ligands_test10
0
0.5
1
1.5
2
2.5
3
3.5
-50 -45 -40 -35 -30 -25 -20 -15 -10
total interaction energy
Fre
qu
ency
Using focused set of spheres appears to be essential for reducing the noise caused by inaccurate scoring function that favors the wrong docking poses, which is alleviated by only using the spheres filled in hot spot region.
DHFR cont. 3
Test1 docked ligands top scored mddr decoys
Test10 docked ligands top scored mddr decoys
Case analysis (Aldose Reductase)
* Structure, 1997, 5:601-612
The conformational flexibility of the binding site appears to contribute to the poor enrichment as implicated by crystal structures, however it may be also due to other factors such as, lack of protein desolvation penalty in scoring function.
AR cont. 1
Energy component histograms for docked AR ligands
0
20
40
60
80
100
120
140
-120 -100 -80 -60 -40 -20 0 20 40
Energy (kcal/mol)
frequ
ency
ele. Ene.
vdw. Ene.
Total Ene.
Correlation coefficients between electrostatic energy and total energy, vdw energy and total energy are 0.74 and 0.66 for docked ligands, individually, 0.62 and -0.33 for docked top 500 decoys. Clearly, electrostatic interaction is way too favorable and dominate the interaction energy score for docked decoys, which might be remedied by including the protein desolvation penalty.
Energy component histograms for docked top 500 AR decoys
0
10
20
30
40
50
60
70
-250 -200 -150 -100 -50 0 50
Energy (kcal/mol)
freq
uenc
y
ele. Ene.
vdw. Ene.
total. Ene.
Case analysis (PARP)PARP Docking parameters Enrichment plots Docking statistics
Test? INDOCK.1;
70 matching spheres
% of db to find 25% of known ligands = 3.0;
Max. enrichment factor = 9.1
113008 cmpds
Test? INDOCK.2;
70 matching spheres
% of db to find 25% of known ligands = 7.3;
Max. enrichment factor =4.2
121035 cmpds
Test? INDOCK.3;
70 matching spheres
% of db to find 25% of known ligands = 10.5;
Max. enrichment factor = 3.3
123872 cmpds
Test9 INDOCK.3;
50 matching spheres
% of db to find 25% of known ligands = 9.4;
Max. enrichment factor = 3.5
123401 cmpds
Test10 INDOCK.3;
35 matching spheres
% of db to find 25% of known ligands = 4.5;
Max. enrichment factor = 6.4
116717 cmpds
Test11 INDOCK.1;
35 matching spheres
% of db to find 25% of known ligands = 2.8;
Max. enrichment factor = 9.6
96533 cmpds
Docked ligands Top scored MDDR decoys
PARP cont. 1
Case analysis (AChE) Poor enrichment (5.0 % of db to find 25% of known ligands)
appears to be caused by the large number of improbable docking poses. The AChE binding cavity is large with many waters and more than one clear binding region in the pocket; no direct hydrogen bonds between the ligand and the protein have been observed, only water-bridged hydrogen bonds, which presents a particular hard case to dock to. (Jacobsson, JMC, 2004)
Can we do something about it to improve our docking for such cases?
Case analysis (Thrombin)
Multiple binding sub-sites? anything to do with the way to generate dockable databaseand the way to match spheres?
Preliminary Conclusion A fully automated docking procedure and a consistent parameter set for
Grids generation, Docking and Scoring appear to perform well across all the tested systems.
Cofactor, iron and structural waters involving in ligand binding are required to be carefully inspected, as well as protonation states of amino acid residues in binding site.
“larger binding pocket, more extensive sampling – INDOCK.3” is required (validated by DHFR, TS, thrombine and GART test sets).
Docking spheres and delphi spheres can be generated by using different schemes. Focused set of matching spheres were shown to be critical for systems like DHFR, TS and GART, and indicates that the information of hot spot in binding pocket will be important for directing docking.
Careful interpretation of docking results (energy component analysis) should be regularly employed to identify possible errors caused by certain factors.
High quality test setsEnrichment data sets (known ligands and decoys datasets)i. Susan test setii. Enolase test setiii. NCTR ER data set: 232 diverse compounds, covers a 106 – fold range in a
validated ER competitive binding assay, and NCTR AR data set: 202 diverse compounds (Tong, et.al. 2001)
iv. McMaster DHFR data set (http://hts.mcmaster.ca)v. Compumine ERalpha , MMP3, AChE and fXa data sets (http://
www.compumine.com/research/scoring.html)
Docking and scoring test sets (experimental structures and binding affinities)
i. CCDC/Astex validation test set: 308 crystal complexes (http://ccdc.cam.ac.uk)ii. X-CScore dock set: 100 crystal complexes and binding affinities (wang, et al.
2003)
Suggestion What is the first and possibly major second putative
major principal component that if fixed would make the enrichment better?
For each improvement that could be made, your estimate of what should be done, how much effort, likelihood of improvement.
Closely look at the active site residues (ionization and protonation states) , use top decoy compounds to identify the residues that contribute to overestimation of the docking energy.
Acknowledgement
John @ Shoichet CK @ Jacobson Ursula & Eswar @ Sali