42
Computer-Assisted Retrosynthesis 2018/06/02 M1 Koki Sasamoto 1

Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

Computer-Assisted Retrosynthesis

2018/06/02

M1 Koki Sasamoto

1

Page 2: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

2

Introduction

Retrosynthesis

Page 3: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

3

Introduction

Computer-assisted synthesis planning (CASP)

It takes less time to devise synthetic routes.

Proportion of successful synthesis rises.

Scientists learn from the results of CASP.

Page 4: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

Contents

1. Introduction

2. Rule-based expert system

3. Machine Learning

4. Summary

4

Page 5: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

5

LHASA

E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.

Interactive system using synthetic tree

Page 6: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

Two-group transform

6

Transform Mechanism

Transform lists

Each transformation has data table.

One-group transform

Functional Group Interchange (FGI) … etc.

Which bond is cleaved

Rating depend on difficulties … etc.

E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.

Page 7: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

7

Retrosynthesis Example

E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.

Page 8: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

8

Other CASP Programs

SECS, SYNCHEM, SYNGEN, IGOR, WODCA, etc…

Failure of CASP

These provided incompatible synthetic routes.

Lack of computing capacity

only having simplified rule set

Improved machine power solved this problem.

Page 9: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

9

Chematica

NOC (Network of Organic Chemistry)

The lowest cost synthetic pathway of taxol(within 50steps)

B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.

Contains 10 millions reaction data

Page 10: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

10

Syntaurus

B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.

Algorithm for retrosynthesis

Page 11: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

11

Scoring Functions

Chemical Scoring Function (CSF)

Reaction Scoring Function (RSF)

Number of rings Mass

Necessity of protection

Yield etc…

B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.

Page 12: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

12

Problems of Expert System

Forward Prediction

Retrosynthesis

Design of scoring functions

Dependence of reaction templates

…Trouble of template creation…Ignoring the context of molecules

Application of unknown reactions

Page 13: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

13

Machine Learning

Page 14: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

Supervised learning

Unsupervised learning

Reinforcement leaning

14

Machine Learning

pattern answer

“Tiger”

Maximizing rewards

Using unlabeled data

Page 15: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

Supervised learning

15

Machine Learning

Regression

output : continuous variableexample : consumption of the entire economy

Classification

output : discrete categoryexample : estimation of animal type

Page 16: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

16

Machine Learning

Feature extraction

Classification

“Tiger”

Page 17: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

17

Feature Space

Page 18: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

18

Neural Network

Learn the best parameter automatically

Page 19: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

19

Reaction Type Prediction

A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.

Predict 17 reaction types from reactants and reagent

Page 20: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

20

Reaction Type Prediction

Predicted probability of each reaction type

A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.

Teacher signal[0,0,0,0.5,0.5,0.0……..] (reaction type 3 or 4)[0,1,0,0,0,…..] (others)

Page 21: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

21

Reaction Type Prediction

A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.

Attempts to solve textbook problems(Wade, Organic Chemistry, 6th ed.)

ExampleResults

Page 22: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

22

Extended Reaction Templates

Improvement of Neural Network

M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.

…dropout, highway network, ELU(activation function)

Page 23: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

23

Prediction Results

M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.

Neural Network learned molecular context?

Page 24: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

24

Candidate Generation and Ranking

Forward Enumeration

Candidate Ranking

1689 reaction templates

focus only on changed atoms / bonds

Reaction type prediction by two frameworks

C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.

Page 25: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

25

Candidate Ranking

C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.

Focus on changed atoms / bonds

Page 26: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

26

Reaction Prediction

Results

Problems of template-based model

coverage scalability

C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.

Page 27: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

27

Sequence to Sequence (seq2seq)

Is the chemical reaction similar to translation?

Reactant Product

P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017

Page 28: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

28

Sequence to Sequence (seq2seq)

P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017

Reaction templates are not necessary.

Page 29: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

29

Sequence to Sequence (seq2seq)

Results

Datasets

TrainingJin’s USPTO training set … 395496

TestJin’s USPTO test set … 38648Lowe’s test set … 50258

P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017

Page 30: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

30

Difficulties in Retrosynthesis

Very huge search space

1030 ~ 1050 possible pathways

> 10000 reactions

Necessity of efficient search method

B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.

Page 31: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

31

Difficulties in Retrosynthesis

Scoring Function

Heuristic dependence

Necessity to expand synthetic tree to the end

Page 32: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

32

Monte-Carlo Tree Search (MCTS)

Reinforcement learning to find the best route

UCB1 =

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.

(In fact, this parameter is more complicated…)

y
Page 33: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

33

Expansion Procedure

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.

y
Page 34: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

34

Datasets

Training datasets

12.4 million single-step reactions- rollout rules- expansion rules

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.

Generate 100 million negative reactions

Data Augmentation

y
Page 35: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

35

Results – MCTS vs BFS

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.

MCTS was faster than BFS (Breadth First Search).

Page 36: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

36

Results – MCTS vs Human

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.

MCTS routes were more preferred.

Page 37: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

37

Summary

Expert system’s problems

Machine Learning

・ Troublesome preparation of reaction rules

・ Application to unknown reactions

・ Lack of scoring function

・ The above problems can be solved.

・ Route design could be done at a level approaching humans.

Page 38: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

38

Future

Current issues

・ The best model was still unknown.

(Finger Print, seq2seq, MCTS?)

Using images for compound cognition,

Graph Representation,

GAN (Generative Adversarial Network) … ?

・ Reaction conditions are not considered.

New reaction descriptor is required.

y
y
y
y
Page 39: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

39

Page 40: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

40

Appendix

ECFP4

https://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/

Page 41: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

41

Appendix

Rule extraction

M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.

Page 42: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et

42

Appendix

In this example, chemists preferred literature routes.

M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.