Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
PhD Candidate: Gian Marco Ghiandoni
Reaction Class Recommendation Modelsin de novo Drug Design
2
Outline
1. Reaction vector-based de novo design
2. Issues related to reaction vector-based de novo design
3. Recommender systems
4. Recommender implementation, screening, and validation
Molecular De Novo Drug Design
Hartenfeller, M. & Schneider, G., 2011. Enabling future drug discovery by de novo design. Wiley Interdisciplinary Reviews: Computational Molecular Science, 1(5), pp. 742-759
1) Scoring Functions
3) Search Strategies
Chemical Structure Biological Activity
De novo Design
3
2) Construction Methods
Molecular De Novo Drug Design
Hartenfeller, M. & Schneider, G., 2011. Enabling future drug discovery by de novo design. Wiley Interdisciplinary Reviews: Computational Molecular Science, 1(5), pp. 742-759
1) Scoring Functions
3) Search Strategies4
2) Construction MethodsGenerative Models (AI)
Use joint distributions p(x, y) to generate new data with
characteristics similar to the training data
Reaction-driven AlgorithmsUse examples of chemical reactions as references to
combine rationally fragments with each other
Database Construction Process
Patent ID: US03931156-1976
USPD Reaction Vector Database
93.000 unique reaction vectors
organised in 336 classes
Computation of the difference vector and
reaction classification:
Cl(1,0,0)-2(1)-C(3,2,1) = +1
N(1,0,0)-2(1)-C(3,2,1) = -1
Database encoding of vectors, synthetic
references (IDs), and reaction classes
USPD Grants 1976-2016
1.8 million reactions of pharmaceutical
interest
Atom-pair description of the reaction components
“Functional Conversion (Amino to Chloro)”
5
Reaction Vector-based De Novo Design
Patel, H. et al., 2009. Knowledge-based approach to de Novo design using reaction vectors. Journal of Chemical Information and Modeling, 49(5), pp. 1163-1184
SM
R
RV
P40,000reagents
93,000classifiedreactions
1,500potentiallyaccessibleproducts
Synthetic references are also provided to speed up the preparation of the candidates
6
Reaction Vector-based De Novo Design
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
SM
R
RV
P40,000reagents
93,000classifiedreactions
1,500potentiallyaccessibleproducts
7
More accessible
Less accessible
PC Analysis
Reaction Vector-based De Novo Design
SM
R
RV
P40,000reagents
93,000classifiedreactions
700recommended
products
8
3500recommended
reactions
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
Recommender
More accessible
Less accessible
PC Analysis
Reaction Vector-based De Novo Design
SM
R
RV
P40,000reagents
93,000classifiedreactions
9
3500recommended
reactions
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
700recommended
products
Recommender
Reaction Vector-based De Novo Design
SM
R
RV
P40,000reagents
93,000classifiedreactions
10
3500recommended
reactions
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
700recommended
products
Recommender
Recommender Systems
11
Ricci, F. et al., 2011. Recommender Systems Handbook. Springer US, Springer-Science Business Media, LLC, pp. XXX-842
Recommendation System
trained on past decisions to provide future suggestions
Input Vectorstructured data
format describing a given entry
List of Suggestionswhich can be used to
prioritise certain decisions
Recommender Systems
12
[C-N Bond Formation (Amination), C-N Bond Formation (N-arylation),Functional Elimination (Defluorination)]
(1) Molecule
(2) Fingerprint
(4) Recommendations
MODEL
Recommendation System
trained on past decisions to provide future suggestions
List of Suggestionswhich can be used to
prioritise certain decisions
Input Vectorstructured data
format describing a given entry
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
(3) Multi-labelAlgorithm
Why reaction classes rather than reactions?
13
Structural NoveltyThe application of groups of reactions
rather than specific examples preserves the chances of finding novel molecules
Database InterchangeabilityClass recommendations can be applied to any compatible database, thus extending
the versatility of the recommender
Data Requirements and Model SizeThe use of classes enhances the model
generalisation, thus reducing the amount of data required to train an effective model
14
Label Discrimination Types
Ghiandoni, G. et al., 2018. Augmenting De novo Drug Design using Reaction Classification [Poster], UK QSAR and MGMS Structure-Activity Relationships conference, Cardiff (UK), 11th - 12th April 2018
Labels can be decomposed in order to include more subclasses within the same suggestions
Hierarchical labelling system:
4-layer: “C-C Bond Formation (Coupling) (Suzuki) (Bromo)”
3-layer: “C-C Bond Formation (Coupling) (Suzuki)”
2-layer: “C-C Bond Formation (Coupling)”
1-layer: “C-C Bond Formation”
Reaction Vector-based De Novo Design
15
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
SM
R
RV
P40,000reagents
93,000classifiedreactions
3500recommended
reactions
Recommender
[C-N Bond Formation (Amination), C-N Bond Formation (N-arylation),Functional Elimination (Defluorination)]
(1) Molecule
(2) Fingerprint
(4) Recommendations
MODEL(3) Multi-labelAlgorithm
Data Mining for Reaction Class Recommendation
18
Ghiandoni, G. et al., 2018. Fingerprint-based Recommendation Models in Reaction-driven Drug Design [Poster], UK QSAR Autumn Fall conference, Oxford (UK), 26th September 2018.
Pivoting
“Similar molecules have similar reactivity”
Identical descriptions are grouped together and associated with multiple
reaction classes
Dataset shape and content depend on the encoding system as well
Model Screening
19
Components:
• 1,056,836 starting materials and classes (US patents)
• 2 levels of class discrimination (generic and specific)
• 22 molecular descriptions (e.g. fingerprints)
• 6 multi-label approaches / 2 classifiers
Procedure:
• Data Split: Training 80% / Test 20%
• Metrics:Recall, Precision, F1-score, Hamming Loss, 0/1 Loss
534 models were screened in total
Model Screening
20
1 2 3Label-type Approach Classifier
21
Design Validation
SM
R
RV
24,000reagents
12,000classifiedreactions
PCControllibrary
3-layerCC-RF
MACCSrecommender
P11-layer
suggestionslibrary
P22-layer
suggestionslibrary
P33-layer
suggestionslibrary
26starting
fragments
22
Design Validation
Synthetic Accessibility Estimations
ID Pipeline Applicable Reactions Unique Products(InChI Keys) Enumeration Time (s) Mean RSynth2 Mean SAscore3
P3 3-layer 85 (0.24) 43,952 (0.31) 381.22 (0.38) 0.57 (1.10) 1.69 (0.94)
P2 2-layer 94 (0.26) 49,988 (0.35) 412.47 (0.42) 0.56 (1.08) 1.78 (0.99)
P1 1-layer 170 (0.48) 73,741 (0.52) 585.03 (0.59) 0.54 (1.04) 1.83 (1.02)
PC Control 357 (1.00) 141,834 (1.00) 991.65 (1.00) 0.52 (1.00) 1.80 (1.00)
1: RSynth values were computed using the software MOE by Chemical Computing Group ULC (2019)
2: SAscore values were computed using the RDKit implementation of the method by Ertl & Schuffenhauer (2009)
23
Design Validation
1: RSynth values were computed using the software MOE by Chemical Computing Group ULC (2019)
2: SAscore values were computed using the RDKit implementation of the method by Ertl & Schuffenhauer (2009)
Synthetic Accessibility Estimations
ID Pipeline Mean RSynth2 Mean SAscore3
P3 3-layer 0.57 (1.10) 1.69 (0.94)
P2 2-layer 0.56 (1.08) 1.78 (0.99)
P1 1-layer 0.54 (1.04) 1.83 (1.02)
PC Control 0.52 (1.00) 1.80 (1.00)
24
Conclusions
• Reaction vector design generates structures that have more chances to be synthetically accessible
• Machine learning recommender systems can be used to support the current design framework by:
• Further increasing the product synthetic accessibility• Reducing the global enumeration time
25
Acknowledgements
Sheffield Chemoinformatics Research Group
Prof. Val GilletJames Webster
Dr. Antonio de la Vega de León
Evotec UK Research Informatics Team
Dr. Mike BodkinDr. James Wallace
Dr. Dimitar Hristozov