Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Big Data and the Dial-‐a-‐Molecule Grand Challenge
Richard Whitby, University of Southampton
From Big Data to Chemical Informa4on
RSC, 22nd April 2015
Grand Challenge
Vision: In 20-‐40 years, scien7sts will be able to deliver any desired molecule within a 7meframe useful to the end-‐user, using safe, economically viable and sustainable processes.
Delivery of novel molecules should be as quick and efficient as it currently is for stock chemicals. How can we make novel molecules in DAYS not YEARS?
Selected as one of three Grand Challenges for Chemistry and Chemical Engineering in a competitive exercise run by EPSRC/RSC/IChemE/CIKTN between June 2008 and Sept 2009.
The Network
STEERING GROUP Prof. Richard Whitbya Prof. Steve Marsdenb Prof. David Harrowvena Prof. Joe Sweeneyc Dr. Harris Makatsorisd Prof. Asterios Gavriilidise Dr. David Hollinsheadf Dr. David Foxg Dr. Mimi Hiih
Dr. Andrew Russelli Dr. Robin APrillj Prof. Nick Turnerk Dr. MaP Tozerl Dr. John Cloughm Dr. Gill Smithn Prof. John Leonardo Dr. Natasha Richardsonp Dr. Simon Rushworthq
aUniversity of Southampton, bUniversity of Leeds, cUniversity of Huddersfield, dBrunel University, eUCL, fSTB Associates, gRSC, hImperial College London, iUniversity of Reading, jGSK, kManchester University, lPeakdale, mSyngenta, nGillian Smith Associates, oAstraZeneca, pEPSRC, qCIKTN
500+ members
40+ Businesses
Large corporate
SME’s
Consultancies
Academia
45 Ins7tu7ons
Learned Socie7es
Funders
Dr Kelly Kilpin University of Southampton
Network Co-‐ordinator
How to solve it
Plan Execute
Ar7ficial intelligence
Data capture
Theore7cal predic7ons
New Reac7on Systems
Real-‐7me Analysis/Op7misa7on
New ‘Perfect’ Reac7ons
Sta7s7cal Analysis
Roadmap: 2011.
(www.dial-‐a-‐molecule.org).
Key Barriers: Making Synthesis Predictable Smart Synthesis by Design Sustainable Synthesis for a Sustainable Future
OpGmum ReacGon and Route Design Why does synthesis take so long?
For moderate complexity targets: Number of reac7ons tried No. in final synthe7c route Can be >100
Key problem is route selec7on. Many reac7ons do not work! -‐ Forces route changes. Others work, but require op7misa7on to give acceptable yields
Historicaldata
Targeted study of reactions
Capture of full reaction data at source
Theoretical models
Prediction of reaction
outcomes
Design / selection of
synthetic route
Electronic Laboratory Notebooks
Make the Molecule
Next Generation Reaction PlatformsNational Service
for the Study of Reactions
Rapid Reaction Analysis
Auto-optimisation
Intellegent Fume
Cupboard
High-throughput automated equipment
Smart Laboratory
ORRD Making Synthesis Predictable Smart Synthesis by Design
Predic7ng reac7on outcomes
How big is reacGon space?
“The size of the chemical space that is of interest to drug developers is es7mated to lie between 1018 and 10200 compounds” 1060 stable organic compounds with MW<500 is figure oien given.
Reac7on space is connec7ons between molecules. 10? x number of molecules?
Each reac7on can be carried out in many ways, with many different condi7ons (x 10?)
FuncGonal group approximaGon. Assumes that par7cular arrangement of atoms will always react the same way independent of the rest of the molecule
O OHNaBH4, EtOH
Par7ally true for some reac7ons, Much less so for others (e.g. skeletal rearrangements)
Predic7ng reac7on outcomes
What do we know? Reaxys, CASReact, SPRESI, PCD… Perhaps 30 x 106 unique reac7ons All of chemistry so far – perhaps 108 reported reac7ons. Each with some informa7on (% yield, reagent, solvent, temperature, 7me, work-‐up…) Poten4al to include calculated data
How well can we do? Computer Aided Synthesis design programs
LHASA (hand coded rules) ARCHEM (rules from database) ICSynth (rules from database) Chema7ca (hand coded rules)
And many other aPempts…
Challenge for Big Data – how to do bePer?
Sparse Incomplete Inconsistent Error strewn
Database: (PPh3)2PdCl2, CuI, NEt3, 80°C, 12h PublicaGon: 2-‐methylbut-‐3-‐yn-‐2-‐ol (3.21 g, 38.16 mmol) and 15 mL NEt3 were added to a mixture of bromobenzene (5.00 g, 31.8 mmol), CuI (181 mg, 954 μmol),PPh3 (996 mg, 3.80 mmol), and PdCl2(PPh3)2(1.34 g, 1.90 mmol) under nitrogen. The reac7on mixture was heated to 80 °C for 12 h. The reac7on was quenched with water (100 mL), extracted with dichloromethane (DCM) (3 x 100 mL). The combined organic phase was dried over MgSO4, filtered off, and concentrated under reduced pressure. The product was obtained as a yellow oil by column chromatography (silica, hexane/ethyl acetate (4:1), Rf=0.33) (yield: 4.84 g, 95percent).
Reality: Many, many variables, most not even recorded. Eg S88 batch process descrip7on may occupy 80 pages!
Using full experimental data
Predic7ng reac7on outcomes
Br
OH
OH
+
Much more data is coming! Electronic Laboratory Notebooks, Automated experimenta7on, High throughput parallel experimenta7on. In situ spectroscopy How to make effec7ve use of the flood of data produced?
Predic7ng reac7on outcomes
Using full experimental data
More effec7ve use of exis7ng data to predict reac7on outcomes
Making use of the flood of detailed data star7ng to be generated.
Determine ‘best’ synthe7c routes.
Big Data and the Dial-‐a-‐Molecule Grand Challenge
Acknowledgements. EPSRC for funding Dial-‐a-‐Molecule coordinators: Bogdan Ibanescue, Susanne Coles, Kelly Kilpin Dial-‐a-‐Molecule Steering group.