Building innovative
drug discovery alliances
Algorithms, Evolution and Network-Based Approaches in Molecular Discovery
ECML PKDD, Nancy, September 2014
PAGE
Drug Discovery
1
The State of the Art
A man doesn’t know what happiness is until he’s married, by then it’s too late! (Frank Skinner)
PAGE
Antipsychotic drugs and their Poly-Pharmacology
2
The chart shows the known affinity (Ki) values of antipsychotic drugs for a panel of receptors.
Haloperidol
Loxapine Clozapine Olanzapine Ziprasidone
Thioridazine Thiothixene Zotepine
How can we go about discovering a novel antipsychotic?.
PAGE
What do I make next?
3
Medicinal Chemistry Optimisation
PAGE
Mapping Computational Drug Discovery
The toolbox
PAGE
ChEMBL
5
ChEMBLSpace
. https://www.ebi.ac.uk/chembldb/
ChEMBLSpace – a graphical explorer of the
chemogenomic space covered by ChEMBL
Bioinformatics (2013) 29 (4): 523-524
10.6k Targets
1.4m Compounds
12.8m Activities
PAGE
ChEMBLSpace search: D2 & a1BA
6 Available for download search: [email protected]
Dopamine D2
a1B-Adrenergic
PAGE
ChEMBLSpace: D2, a1BA, H1
7 Available for download search: [email protected]
Histamine H1
Dopamine D2
a1B-Adrenergic
PAGE
Similarity Ensemble Approach (SEA)
8
Keiser et al.
Nature 462, 175-181(12 November 2009)
PAGE
Phenotypic Deconvolution
9 Drakakis et al., Med Chem Commun 5, 386-396 (2014)
• Combination of: • Target Prediction • BioSAR - Laplacian-modified Naïve Bayes algorithm • Information gain prefiltering • Decision Trees (C4.5) for Classification
• Yields 70% accurate,
• Interpretable model for sleep outcome.
BIOPRINT
IF
ACTIVITY_ON D(2) Dopamine receptor
AND
ACTIVITY_ON Histamine H1 receptor
AND
ACTIVITY_ON 5-hydroxytryptamine receptor 2A
THEN
“Good Sleep”
PAGE
Accessing Chemical Space
10 J. Chem. Inf. Model., 2012, 52, 2864-2875
Heavy Atom Count (hac)
166 Billion
2.5 Million
89 Thousand
367 Compounds
A virtual enumeration of chemical space up to 17 heavy atoms generated 166,443,860,262 molecules *
Plo
tte
d o
n a
lo
g s
ca
le
A Pharma screening collection up to 17 heavy atoms is typically 100–500K molecules
== 0.000003% of accessible space
PAGE
Strategies in Automated Molecule Design
11 De Novo Design 2013 | ISBN 978-3-527-67700-9
Core X B A
C
Core X B A
C Grow
Replace Scaffold
Constraints
2D or 3D
Generation
Protein Cavity
Volume Based
Feature Based
Synthetic Feasibility
Reaction Based
Reagent Enumeration
Transformation Based
Complexity Score
User Assessment
Core X B
C
Core Y
Merge Molecules
A
PAGE
Motivation
12
Using reaction databases
public
reaction
databases
reactions
covering general
organic chemistry
reaction
database
commercial
reaction
databases U
Adding say ~250,000
reactions per year, strong
medicinal chemistry bias
wealth of reaction data
– extract the knowledge hidden in these data
– use this knowledge to assist the medicinal chemist
– suggest new, synthetically feasible molecules with desired bio profile
500 chemists
~10 reactions per week
lab notebooks (eLN)
proprietary
reaction
database
PAGE
Reaction Vectors
13 Broughton, H. B., Hunt, P. & Mackey, M. (2003) Methods for Classifying and Searching Chemical Reactions.
United States Patent Application 367550.
I 1 2 3 4
Bond C-C C=O C-OH C-OR
# 0 0 -2 2
reactant vector, R = (R1 + R2) product vector, P
reaction vector, D = P - R
OH
O
OHO
O
+
I 1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 2 0
I 1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 0 2
R1 R2 P
PAGE
Reaction Vectors in Structure Generation
14 J. Chem. Inf. Model., 2009, 49 (5), pp 1163–1184. Knowledge-Based Approach to de Novo Design Using Reaction Vectors
• The reaction vector, D, equals the difference between the product
vector, P, and the reactant vector, R
D = P – R
I 1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 0 2
O O
O
O
O
O
better descriptor
is required
Given a reaction vector, D, and a reactant vector, R, the product vector, P,
can be obtained
P = D + R
Given a product vector, P, can we reconstruct the product molecule(s)?
PAGE
Modified Atom Pairs
15
Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r)
X: element type
n: number of bonds to heavy atoms
p: number of π bonds
r: number of ring memberships
BO: bond order
Atom Pairs 3 (AP3): X1(n, p, r)-3-X2(n, p, r)
• Extending the bond distance in atom pairs encodes more of
the environment of the reaction centre
PAGE
Beckmann Rearrangement
16
Cl
N
OH Cl
NH
O
Reaction vector
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r)
Atom Pairs 3 (AP3): X1(n, p, r)-3-X2(n, p, r)
X element type
n number of bonds to heavy atoms
p number of π bonds
r number of ring memberships
BO bond order
PAGE
Applying a RV to a reactant to generate a Product
17
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
Cl
N
OH Cl
NH
O
1. Removing the negative atom pairs from the reactant
1- 2-
3-
PAGE
Applying a RV to a reactant to generate a Product
18
No AP2s left in the reaction
vector that match atom 11
CH3
5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
CH3
5
CH
1
CH2
6CH
4
Cl7
C8
CH3
9
NH10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
NH
10
C11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10 C8
CH3
9
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
NH11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10C 8
CH3
9
O11
CH3
C5
CH
1
CH2
C 6
CH
4
Cl7
C8
CH3
9
O10
NH
11
Duplicate solutionFinal Solution
2+ (d+) 3+ (f+)
1+ (a+)1+ (a+,b+,c+) 2+ (d+,e+)
3+ (e+,f+) 1+ (a+,b+,c+)
A
B C
D E F
G H
No AP2s left in the reaction
vector that match atom 11
CH3
5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
CH3
5
CH
1
CH2
6CH
4
Cl7
C8
CH3
9
NH10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
NH
10
C11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10 C8
CH3
9
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
NH11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10C 8
CH3
9
O11
CH3
C5
CH
1
CH2
C 6
CH
4
Cl7
C8
CH3
9
O10
NH
11
Duplicate solutionFinal Solution
2+ (d+) 3+ (f+)
1+ (a+)1+ (a+,b+,c+) 2+ (d+,e+)
3+ (e+,f+) 1+ (a+,b+,c+)
A
B C
D E F
G H
CH3
5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
CH3
5
CH
1
CH2
6CH
4
Cl7
C8
CH3
9
NH10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
NH
10
C11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10 C8
CH3
9
CH3
C5
CH
1
CH2
C6
CH
4
Cl7
C8
CH3
9
O10
NH11
CH3
5
CH
1
CH2
6
CH
4
Cl7
NH
10C 8
CH3
9
O11
CH3
C5
CH
1
CH2
C 6
CH
4
Cl7
C8
CH3
9
O10
NH
11
Duplicate solutionFinal Solution
2+ (d+) 3+ (f+)
1+ (a+)1+ (a+,b+,c+) 2+ (d+,e+)
3+ (e+,f+) 1+ (a+,b+,c+)
A
B C
D E F
G H
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
O(1,1,0)-3-C(1,0,0)f+N(2,1,0)-3-C(1,0,0)f-
N(2,0,0)-3-O(1,1,0)e+C(3,1,0)-3-O(1,0,0)e-
N(2,0,0)-3-C(1,0,0)d+C(3,1,0)-3-C(2,2,1)d-
C(2,2,1)-3-N(2,0,0)c+C(3,1,0)-3-C(2,2,1)c-
C(2,2,1)-3-N(2,0,0)b+C(3,2,1)-3-C(1,0,0)b-
C(2,2,1)-3-N(2,0,0)a+C(3,2,1)-3-N(2,1,0)a-
C(3,1,0)-2(2)-O(1,1,0)3+N(2,1,0)-2(1)-O(1,0,0)3-
C(3,1,0)-2(1)-N(2,0,0)2+C(3,1,0)-2(2)-N(2,1,0)2-
C(3,2,1)-2(1)-N(2,0,0)1+C(3,2,1)-2(1)-C(3,1,0)1-
Positive APsNegative APs
2. Adding positive atom pairs to the fragment
X element type
n number of bonds to heavy atoms
p number of π bonds
r number of ring memberships
BO bond order
Atom Pairs 2 (AP2) : X1(n, p, r)-2(BO)-X2(n, p, r)
PAGE
How well does it work?
19
Organic Chemistry Database
Hristozov, D., Bodkin, M., Chen, B., Patel, H. & Gillet, V. J. 2011. Validation of Reaction Vectors for De Novo Design. Library Design, Search Methods, and Applications
of Fragment-Based Drug Design. American Chemical Society.
Reaction Type Number of
Reactions
Correctly Reproduced
Number Per cent
Epoxide reduction 450 449 99.8
Epoxide formation 450 444 98.7
Ester to amide 172 172 100.0
Alcohol dehydration 171 169 98.8
Claisen rearrangement 61 54 88.5
Beckmann rearrangement 123 123 100.0
Friedyl Crafts acylation 113 113 100.0
Olefin metathesis 9 7 77.8
Dieckmann condensation 98 91 92.9
Nitro reduction 231 230 99.6
Alkene oxidation 272 272 100.0
Cope rearrangement 453 306 67.5
Aldol condensation 134 134 100.0
Alcohol amination 97 97 100.0
Amide reduction 51 51 100.0
Diels-Alder hetero 441 320 72.6
Ether halogenation 58 58 100.0
Ozonolysis 132 125 94.7
Claisen condensation 98 77 78.6
Carboxylic acids to aldehydes 194 194 100.0
Nitrile reduction 102 102 100.0
Diels-Alder cycloaddition 106 65 61.3
Fischer indole 230 94 40.9
Alkene halogenation 310 281 90.6
Nitrile hyrdrolysis 460 460 100.0
Olefination 455 427 93.8
Wittig-Horner 211 190 90.0
Robinson annulation 13 10 76.9
Total 5,695 5,115 89.8
• ~3 seconds per reaction average,
0.015 seconds median run time
• Products generated for 5,115
reactions (~90% of the 5,695)
PAGE
Evolutionary Design
20
Input Mutate Score Rank
Load Molecule
Output
Cream off top ranking population
Multi-objective Generate new molecules Any scoring tool
Loop back round
Best new molecules
PAGE
Open Source KNIME Contributions
21
http://tech.knime.org/community
Future Medicinal Chemistry 2012, 4, 1889-1891
J. Chem. Inf. Model.
2008, 48(3), 646–658.
J. Chem. Inf. Model.,
2009, 49(5), 1163-84.
J. Chem. Inf. Model
2010, 50 (10), 1872-1886.
2D/3D viewer
PAGE
Multi-Objective Optimisation
22
Haloperidol
Ziprasidone
“Score” = Similarity + D2predAct + a1BApredAct + H1predAct
Q2 = 0.76
Pre
dic
ted
Actual Actual P
redic
ted
Q2 = 0.68
Actual
Q2 = 0.40
Pre
dic
ted
Union of Descriptor
+
PAGE
Results: Piperidine
23
26K Reactions, 93K Reagents
Starting Material
Fluphenazine
Chlorpromazine
PAGE
It looks great but ...
24
1. The algorithm only knows about transformation types that are in the Db!
2. The AP2/3’s cover 1 and 2 bonds. Remote functionality isn’t considered.
3. A reaction path is not a drug “optimisation”!
a b c d e
i ii iii iv
But how do we get from
here to here?
PAGE
Reaction Sequence Vectors
Tools for molecular design
x b
a b c d e
i ii iii iv
e a
a b
c a
d a
Sequence Vectors
Molecules Nodes // RVs Edges
PAGE
Validating Sequence vectors
26
J. MedChem 2009-2011: 26K reactions
James Wallace ukQSAR Spring 2014
26K => 266K Sequences
Reproducing a => x
PAGE
SAR Exploration
27
Succinyl Hydroxamates
Bailey, S., et al., 2008. Bioorganic & Medicinal Chemistry Letters, 18, 6562-6567
PAGE
Novel SAR
28
Succinyl Hydroxamates
James Wallace ukQSAR 2014
PAGE
Principle Components Analysis of Property Space
29
Succinyl Hydroxamates
James Wallace ukQSAR 2014.
PAGE
Mapping Discovery Space
30
Sequence vector network
1M+ reactions from US Patent Database: Nextmove
PAGE
The PGC GWAS
31
Genome Wide Association Studies
PAGE
GPCRs associated with Schizophrenia GWAS genes
32
2 step shortest paths network
Suzanne Brewerton – UKQSAR Lilly 2014
Adenosine receptor network
Histamine receptor network
Adrenergic receptor network
Dopamine receptor network
Opioid receptor network Metabotropic glutamate receptor network
5HT receptor network
Schizophrenia GWAS genes
GPCR receptor subtypes
Eg, From: Dopamine receptor subtypes To: Proteins defined by PGC2 schizophrenia GWAS genes
PAGE
Hubs in the GPCR schizophrenia network
33 Suzanne Brewerton – UKQSAR Lilly 2014
GPCRs and GPCR signalling proteins Proteins identified by schizophrenia GWAS Hubs involved in GPCR
signalling in schizophrenia
Remove proteins degree <10
PAGE
Alzheimer’s Disease Neuropathology
34 http://www.tuepedia.de/index.php/Mensa_Prinz_Karl
b-Amyloid plaques • Ab peptide • Extracellular deposits
Neurofibrillary tangles • Tau protein • Intraneuronal filamentous inclusions
Brain Atrophy
PAGE
Alzheimer’s Disease Network
35
Canonical network
PAGE
Network Based Design
36
2. Identify cmpds with
known/predicted selectivity
3. Screen reference set &
generate gene signatures.
1. Disease Hypothesis
4. Relate chemistry to gene Bioprint.
6. Off-Target
Hypothesis 5. Network
Enrichment.
Novel target
identification
PAGE
In-Silico Network Based Design
37
Protein Target
Deconvolution
Connecting Gene Expression Data from Connectivity Map and In Silico Target Predictions For Small Molecule Mechanism-of-Action Analysis
Bender et al., Molecular BioSystems 2014 – accepted
PAGE
The Drug Discovery Challenge
38
Summary
Given a disease signature how do we best sample appropriate chemistry space?
PAGE
Acknowledgments
39
Dimitar Hristozov
David Thorner
David Evans
Nik Fechner
George Papadatos
Suzanne Brewerton
Hina Patel
Ben Allen
James Wallace
John Liebeschuetz
Jason Cole
Val Gillet
Beining Chen
Andreas Bender
Georgios Drakakis
Your contact:
Building innovative
drug discovery alliances
Mike Bodkin VP Computational Chemistry & Cheminformatics 114 Innovation Drive, Milton Park, Abingdon Oxfordshire OX14 4RZ, UK T: +44 (0)1235 44 1207