Upload
ed-griffen
View
41
Download
0
Embed Size (px)
Citation preview
Ed Griffen, MedChemica Ltd
Extracting actionable knowledge from large scale in vitro pharmacology data
MedChem
icaWhy improve medicinal chemistry practice?For an aging population and emerging pathogens“Eroom’s Law” – The cost of discovering a new drug has doubled every 9 years consistently for the last 60 years.1
= cost 8%/year
2
1. Scannell et al Nature Reviews Drug Discovery (2012), 11, 191-2002. Paul et al Nature Reviews Drug Discovery (2010), 9, 203-214
Cost / $million
Cost/Launch(2010): $873mCapitalised: $1.8Bn2
targe
t-to-hi
t
Hit-to-L
ead
Lead
Opti
misatio
n
Preclin
ical
Phase
I
Phase
II
Phase
III
Submiss
ion to
Laun
ch0
50
100
150
200
250
300
350
400
450
500
Cost / projectCost/LaunchCost/Launch (capitalized)
MedChem
icaActionable knowledge
Critical information that the user can immediately choose a course of action from:
3
ADME– ways to ‘fix’ your molecule
Toxicology – sub structures to avoid
Pharmacology – substructural leads built for practical design
MedChem
ica
Roche Data
rule finde
r
RocheDatabas
e
Genentech Data
rule finde
rGenentech
Data
AZData
rule finde
r
AZ Databas
e
Grand Rule
Database
ADMET Rule databaseBetter medicinal chemistry by combining knowledge
MedChemica
Grand Rule
Database
Grand Rule
Database
Grand Rule
Database
AZExploitation
RocheExploitation
GenentechExploitation
Pharma 4 Data
rule finde
rPharma 4
DataGrand Rule
Database
Pharma 4Exploitation
Grand Rule
Database
Pharma 5 Data
rule finde
rPharma 5
DataGrand Rule
Database
Pharma 5Exploitation
Grand Rule
Database
>500 million pairs from companies+ 12 million from public data
Current Knowledge sets – GRDv3Numbers of statistically valid transformsGrouped Datasets Number of
RuleslogD7.4 153449Merged solubility 46655In vitro microsomal clearance:
Human, rat ,mouse, cyno, dog88423
In vitro hepatocyte clearance :
Human, rat ,mouse, cyno, dog26627
MCDK permeability A-B / B – A efflux 1852Cytochrome P450 inhibition:
2C9, 2D6 , 3A4 , 2C19 , 1A2 40605
Cardiac ion channels NaV 1.5 , hERG ion channel inhibition 15636
Glutothione Stability 116plasma protein or albumin binding Human, rat ,mouse, cyno, dog 64622
MedChem
icaActionable knowledge
Critical information that the user can immediately choose a course of action from:
6
ADME– ways to ‘fix’ your molecule
Toxicology – substructures to avoid
Pharmacology – substructural leads built for practical design
MedChem
icaClear structural direction from Big DataExampleDopamine Transporter inhibitors
7
pKi Predicted 8.6Measured 9.1Mean with Pharmacophore 8.3 Mean without 6.7n examples 27Odds ratio : ChEMBL 407
What do I want?:
• Substructures associated with potency
• Specificity of model
• Predictions
• Domain of Applicability
CHEMBL538405
MedChem
icaMedChemica Principles of Pharmacophore Extraction
• Pharmacophores must be clear and understandable• Pharmacophore generation must be transparent to allow checking and
validation• Use as much measured data as possible• Look for key elements influencing potency• Don’t base pharmacophores on a few compounds• Pharmacophore must be specific
• (not like phenyl + amine = hERG inhibitor)• Can be applied quickly (to large libraries)
8
Cation
HyAr
HyAr
How do I actually use
this?
MedChem
icaQSAR and Knowledge extractionModel as filter or knowledge?
9
substructures Physical chemistry descriptors(Hansch, Taft, Fujita, Abraham)
Atomic, pair, tripletdescriptors
Indices
(M)LR Free Wilson
PLS
Trees / Forests
SVM
Bayesian NN
Deep Learning Dark Black
INTERPRETABILITY
Descriptors
Method
MedChem
ica
• Identify key potency giving changes by matched molecular pair analysis on large datasets • Extract fragments that are associated with potency• Find pairs of fragments and linkers that are specific to potent compound subsets• easy to use 2D pharmacophores
• 2 potency enhancing fragments joined by a specific linker
Specific Pharmacophore extraction from MMPA
10
Model
1470 compounds CHEMBL339Dopamine
transporter
Pharmacophore
Identification CHEMBL538405 pKi 9.1
ExampleFragment I
Fragment 2Linker
pKi Predicted 8.6Measured 9.1Mean with Pharmacophore 8.3 Mean without 6.7n examples 27Odds ratio : ChEMBL 407
Predict potency and
show Pharmacopho
re match
Public Data
Find Matched Pairs
Pharmacophores
Find Pharmacopho
re dyads
Find Potent
Fragments
MedChem
ica
Matched Molecular Pairs• Molecules that differ only by a
particular, well-defined structural transformation
Transformation with environment capture
• MMPs can be recorded as transformations from A B
• Environment is essential to understand chemistry
Statistical analysis • Learn what effect the transformation has had on properties in the past
Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. Journal of Medicinal Chemistry. 2011, 54(22), pp.7739-7750.
Advanced MMPA
Δ Data A-B1
2
2
33
3
4
44
1223
3
34
44
A B
Public Data
Find Matched Pairs
Fragments
MedChem
icaMatched pair methodologybecause MCSS and F&I each find different pairings
A – CHEMBL156639 B - CHEMBL2387702 A – CHEMBL100461 B –CHEMBL103900
MCSS ✓, F&I ✗ MCSS ✗ , F&I ✓
MCSS ✓, F&I ✗
MCSS ✓, F&I ✗
MCSS ✗, F&I ✗ MCSS ✗, F&I ✗MCSS ✗ , F&I ✓
MCSS ✓, F&I ✗
MedChem
icaDoes the Matched Pair method really matter?Using only one technique will miss between 12% and 56% of pairings
13
Pairings Pairingsnumber of compounds common FI only MCSS only total FI only % common % MCSS only %
VEGF 4466 14631 17172 14823 46626 37 31 32Dopamine Transporter 1470 4480 8930 3497 16907 53 26 21
GABAA 848 2500 1722 4205 8427 20 30 50
D2 human 3873 12995 13811 13098 39904 35 33 33
D2 rat 1807 5408 6595 7346 19349 34 28 38Acetylcholine esterase 383 536 725 1434 2695 27 20 53Monoamine oxidase 264 653 1156 246 2055 56 32 12
min 20 20 12
max 56 33 53
FI MCSS
com
mon
MedChem
icaMining transform sets to find potent fragments
Identify the ‘A’ fragments associated with a significant number `of potency decreasing changes – irrespective of what they are replaced with‘A’ is ‘better than anything you replace it with’
Fragment A Fragment BChange in binding measurement
• One-tailed binomial test with Holm–Bonferonni correction at 95% confidence identifies potent fragments
• Compare the mean of the compounds that contain the fragment with the mean of the remaining compounds
Statistics:
pKi/pIC50
Compounds containing
potent fragment
Remaining compounds
Effect size = Cohen’s d test
A
BC ED
+2.1+2.2+1.4
+0.4 F
+1.8
Public Data
Find Matched Pairs
Find Potent
Fragments
Cohen’s d
Effect sizes:Large >= 0.8Medium 0.5 – 0.8Small 0.2 - 0.5Trivial 0.1 – 0.2No effect < 0.1
MedChem
icaMining transform sets to find destructive fragments
Identify the ‘Z’ fragments associated with a significant number `of potency increasing changes – irrespective of what they are replaced with‘Z’ is ‘worse than anything you replace it with’
Fragment A Fragment BChange in binding measurement
Public Data
Find Matched Pairs
Find Potent
Fragments
+2.7
+3.2+0.6
+0.6
Z
pKi/pIC50
Compounds containing destructive fragment
Remaining compounds
MedChem
icaMining transform sets to find influential fragments
Identify the ‘Z’ fragments associated with a significant number `of potency increasing changes – irrespective of what they are replaced with‘Z’ is ‘worse than anything you replace it with’
Fragment A Fragment BChange in binding measurement
Public Data
Find Matched Pairs
Find Potent
Fragments
+2.7
+3.2+0.6
+0.6
Identify the ‘A’ fragments associated with a significant number `of potency decreasing changes – irrespective of what they are replaced with‘A’ is ‘better than anything you replace it with’
A+2.1+2.2
+1.4+0.4
+1.8
Z
pKi/pIC50
Compounds with destructive fragment
Compounds with constructive
fragments
MedChem
ica
17
Building Pharmacophores from potent Fragments
But individual Fragments are small and often non – specific so…
• Permutate all the pairs of fragments and find the the shortest path between them (pharmacophore dyads) in the training set
• shortest path between them encodes distance & geometry
• select pharmacophore dyads with PLS to identify the dyads that are explaining most of the potency
• check for significance and effect size with Cohen’s d and Welch’s t-test.
• But what about specificity?
Path
Fragment 1
Fragment 2[CH2]CN
Public Data
Find Matched Pairs
Pharmacophores
Find Pharmacopho
re dyads
Find Potent
Fragments
MedChem
icaTesting for specificity - pharmacophores
• How selective is the pharmacophore?• What are the odds of it hitting a molecule in the test
set vs CHEMBL?
• Odds of finding in potency set =n(pharmacophore hits in potency set)
n(in potency set)
• Odds of finding in CHEMBL =n(pharmacophore hits in CHEMBL not in potency set)
n(in CHEMBL)
• Odds ratio = selectivity =
Odds of finding in potency set_______
Odds of finding in CHEMBL(not potency set)
18
271470
621351211
27/147062/1351211=407(95% confidence limits: 259-642)
Odds of hitting a potent compound are 407 times greater than a random compound in CHEMBL
Path
Fragment 1
Fragment 2[CH2]CN
MedChem
icaHow specific is a Pharmacophore?What does a bad odds ratio look like?
What is the odds ratio?
Found in CHEMBL 565658/1352681
Found in CHEMBL240 – hERG where pIC50 >=5 1985/2451
OR = 1985/2451 = 0.81565658/1352681 0.42
=1.94 (95% conf 1.83 – 2.05)
19
Lipophilic base, usually a tertiary amineX = 2-5 atom chain, may include rings, heteroatoms or polar groups
XN
R1
R2
e. g. sertindole: 14nM vs hERG
[$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~*~c)]
Early simple hERG model
Ar-linker-base has only been found 1.9x more often in hERG inhibitors than at random in ChEMBL
MedChem
icaDomain of Applicability“Whereof one cannot speak, thereof one must be silent.”1
Claiming to have extracted knowledge or making a prediction when we know don’t have enough evidence is:
• Delusional• Dangerous
• it would be more productive to act on a different hypothesis or at random• Degrades using rational analysis at all
Compound activity prediction should have three classes of output:
• Active• Inactive• Out of domain – no prediction possible
Only fragments with sufficient evidential base are used to form into pharmacophore dyads
In turn only pharmcophore dyads that have enough support are used in the model
201. Wittgenstein, Tractatus Logico-Philosophicus, 1922
MedChem
icaModel activity from presence of Pharmacophores
21
Identify and group Fragment SMARTS from MMPA
If n ≥ 8, perform a one-tailed binomial test with Holm-Bonferroni adjustment
Remove non significant ‘Biophores’
Compare the mean of the compounds containing the biophore with the mean of the remaining compounds for significance (Welch’s t test and effect size Cohen’s d)
Permutate all the significant Biophores and determine the shortest paths between them in the training set = Pharmacophore
dyads
Select Pharmacophore dyads with n >=6 examples
Use presence /absence of Pharmacophore dyad as an
indicator variable in PLS modelling
Dopamine Transport +/- pharmacophores
MedChem
icaModelling critical safety targets
22
1. J. Bowes, A. J. Brown, J. Hamon, W. Jarolimek, A. Sridhar, G. Waldron, and S. Whitebread, “Reducing safety-related drug attrition: the use of in vitro pharmacological profiling,” Nat. Rev. Drug Discov., vol. 11, no. 12, pp. 909–922, Nov. 2012
Public Data
Find Matched
PairsPharmacophores
Find Pharmacophore
dyads
Find Potent
Fragments
Target Class Effect Number of compoundsAcetylcholine esterase - human enzyme CV: drop in BP, drop in HR, bronchioconstriction 383
b 1 adrenergic receptor GPCR CV: change in HR, BP, bronchiodilation, vasodilation, tremor 505
Androgen receptor NHR Endocrine: agonism: androgenicity / gynecomastia, prostrate / breast carcinoma 1064
CB1 canabinnoid receptor GPCR CNS: euphoria, dysphoria, anxiety, memory impairment, analgesia, hypothermia, weight loss, emesis, depression 1104
CB2 canabinnoid receptor GPCR increased inflammation 1112Dopamine D2 receptor - human GPCR CNS: hallucinations, drowsiness, confusion, emesis,
CV drop in heart rate 3873
Dopamine D2 receptor - rat GPCR As human 1807
Dopamine Transporter Transporter CNS: addictive psychostimulation, depression , parkinsonism, seizures 1470
GABA A receptor Ion channel CNS: anxiolysis, ataxia, sedation, depression, amnesia 848hERG ion channel Ion channel CV: QT prolongation 41895HT2a receptor GPCR CNS:drop in body temp, anxiogenic 642Monoamine oxidase enzyme CV increase BP, DDI potential CNS: dizziness, nausea 264Muscarinic acetyl choline receptor M1 GPCR CNS: proconvulsant, drop in cognitive function, vision
impairment 628
m opioid receptor GPCR CNS: sedation, abuse liability, respiratory depression, hypothermia 1128
MedChem
ica
Target Number of compounds
Number of compound
pairsNumber of Fragments
Number of Pharmacophore
dyads after filtering
R2 RMSEP ROC odds_ratio (geomean)
Acetylcholine esterase - human 383 27755 44 10 0.43 1.57 0.80 4
b 1 adrenergic receptor 505 145447 276 313 0.64 0.70 0.96 833
Androgen receptor 1064 113163 186 46 0.47 0.77 0.86 140
CB1 canabinnoid receptor 1104 88091 165 90 0.61 1.02 0.87 96
CB2 canabinnoid receptor 1112 82130 194 158 0.19 0.85 0.64 5.7
Dopamine D2 receptor - human 3873 230962 483 602 0.42 0.88 0.69 110
Dopamine D2 receptor - rat 1807 118736 267 377 0.29 0.85 0.78 125
Dopamine Transporter 1470 106969 282 336 0.58 0.73 0.88 141
GABA A receptor 848 39494 106 167 0.70 0.76 0.97 560
hERG ion channel 4189 242261 392 76 0.61 0.96 0.92 55
5HT2a receptor 642 50870 197 267 0.61 0.59 0.83 600
Monoamine oxidase 264 15439 44 11 0.12 1.25 0.48 181Muscarinic acetylcholine receptor M1 628 48200 97 510 0.62 0.94 0.89 48
m opioid receptor 1128 37184 33 11 0.69 1.30 0.87 81
Modelling critical safety targets
• Build models using 10-fold cross validated PLS• Assess using ROC / BEDROC, R2 vs 100 fold y-scrambled R2 and geomean odds
ratio
23
Public Data
Find Matched
PairsPharmacophores
Find Pharmacophore
dyads
Find Potent
Fragments
MedChem
icaToxophore examplesDetailed, specific & transparent
24
Dopamine D2 receptor humanActual: 9.5Predicted: 9.1Mean with: 8.0Mean without: 6.6Odds Ratio: 340
Dopamine TransporterActual: 9.1 Predicted: 8.6 Mean with: 8.3Mean without: 6.7Odds Ratio: 407
GABA-AActual: 9.0Predicted: 8.7Mean with: 8.0Mean without: 6.8Odds Ratio: 1506
b1 adrenergic receptorActual: 7.8Predicted: 7.7Mean with: 6.5Mean without: 5.7Odds Ratio: 1501
MedChem
icaSafety Target Conclusions
• We can model safety critical targets and extract both predictive models and useful ligand structural information
• Clear areas to action
• Clearly defined domain of applicability• No prediction where there is insufficient evidence (conservative method)
• The method relies on having large data sets >= 500 data points• MMPA is computationally intense phase• But of course molecules only need pairing once…
25
MedChem
icaActionable knowledge
Critical information that the user can immediately choose a course of action from:
26
ADME– ways to ‘fix’ your molecule
Toxicology – sub structures to avoid
Pharmacology – substructural leads built for practical design
MedChem
icaPrediction of unseen new moleculesThe acid test…
• Vascular endothelial growth factor receptor 2 tyrosine kinase (KDR)• Inhibitors have oncology and ophthalmic indications• Large dataset in CHEMBL• 10 fold cross validated PLS model• Selected model by minimised RMSEP
27
Compounds 4466Matched Pairs 288100Fragments 678
Pharmacophore dyads 787RMSEP 0.8R2 0.64Y-scrambled R2 0.0ROC 0.95Geomean odds ratio 80
MedChem
icaNovartis Predictions From Our ModelDomain of Applicabiltiy….
Actual: 8.4[1]
Predicted: 7.5
28
Actual: 7.6[1]
Predicted: 7.5
1. J MedChem(2016), Bold et al.2. MedChem Lett (2016), Mainolfi et al.
Actual: 7.7[2]
Predicted: 7.1 Actual: 9.0[2]
Predicted: Out of Domain
MedChem
icaValue of Potency prediction from MMPA:Clear substructures enable rapid actions
29
Compounds + data
Safety data
Potency data
HTS data
Toxicity alerts
Virtual Library prioritisation
Virtual Library design
Fragment set design
Retest prioritisation
Hit re-mining / analogue hunting
Substructure modification
Lead design
Fast Follower design
26 examples in training set
Mean without pharmacophor
e
Mean with pharmacophor
e
MedChem
icaThe MedChemica team
Andrew G LeachAl DossetterShane MontagueLauren Reid*Jess Stacey*
*Royal Society of Chemistry Industrial Placements Grant Scheme
MedChem
icaA Collaboration of the willing
Craig Bruce OEDavid Cosgrove GalCozAndy Grant★
Martin Harrison ElixirPaul Faulder ElixirAndrew Griffin ElixirHuw Jones Base360Al RabowDavid Riley AZGraeme Robb AZAttilla Ting AZHoward Tucker retiredDan Warner MyjarSteve St-Galley SygnatureDavid Wood JBA Risk
Management
Phil Jewsbury AZMike Snowden AZPeter Sjo AZMartin Packer AZManos Perros AZNick Tomkinson AZMartin Stahl RocheJerome Hert RocheMartin Blapp RocheTorsten Schindler RochePaula Petrone RocheJohn Cumming RocheJeff Blaney GenentechHao Zheng GenentechSlaton Lipscomb GenentechJames Crawford Genentech