12
Big Data and the DialaMolecule Grand Challenge Richard Whitby, University of Southampton From Big Data to Chemical Informa4on RSC, 22nd April 2015

Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Big  Data  and  the  Dial-­‐a-­‐Molecule    Grand  Challenge  

 Richard  Whitby,  University  of  Southampton  

 From  Big  Data  to  Chemical  Informa4on  

RSC,  22nd  April  2015    

Page 2: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Grand  Challenge  

Vision:   In   20-­‐40   years,   scien7sts  will   be  able   to   deliver   any   desired   molecule  within   a   7meframe   useful   to   the   end-­‐user,  using  safe,  economically  viable  and  sustainable  processes.  

Delivery  of  novel  molecules  should  be  as  quick  and  efficient  as  it  currently  is  for  stock  chemicals.  How  can  we  make  novel  molecules  in  DAYS  not  YEARS?  

Selected as one of three Grand Challenges for Chemistry and Chemical Engineering in a competitive exercise run by EPSRC/RSC/IChemE/CIKTN between June 2008 and Sept 2009.

Page 3: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

The  Network  

STEERING  GROUP    Prof.  Richard  Whitbya    Prof.  Steve  Marsdenb  Prof.  David  Harrowvena    Prof.  Joe  Sweeneyc    Dr.  Harris  Makatsorisd    Prof.  Asterios  Gavriilidise  Dr.  David  Hollinsheadf  Dr.  David  Foxg  Dr.  Mimi  Hiih            

   Dr.  Andrew  Russelli  Dr.  Robin  APrillj  Prof.  Nick  Turnerk  Dr.  MaP  Tozerl  Dr.  John  Cloughm  Dr.  Gill  Smithn  Prof.  John  Leonardo  Dr.  Natasha  Richardsonp  Dr.  Simon  Rushworthq  

aUniversity  of  Southampton,  bUniversity  of  Leeds,  cUniversity  of  Huddersfield,  dBrunel  University,  eUCL,  fSTB  Associates,  gRSC,  hImperial  College  London,  iUniversity  of  Reading,  jGSK,  kManchester  University,  lPeakdale,  mSyngenta,  nGillian  Smith  Associates,  oAstraZeneca,  pEPSRC,  qCIKTN    

500+  members  

40+  Businesses  

Large  corporate  

SME’s  

Consultancies  

Academia  

45  Ins7tu7ons  

Learned  Socie7es  

Funders  

Dr  Kelly  Kilpin  University  of  Southampton  

Network    Co-­‐ordinator  

Page 4: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

How  to  solve  it  

Plan   Execute  

Ar7ficial  intelligence  

Data  capture  

Theore7cal  predic7ons  

New  Reac7on  Systems  

Real-­‐7me  Analysis/Op7misa7on    

New  ‘Perfect’  Reac7ons  

Sta7s7cal  Analysis  

Page 5: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Roadmap:  2011.    

(www.dial-­‐a-­‐molecule.org).  

Key  Barriers:    Making  Synthesis  Predictable    Smart  Synthesis  by  Design    Sustainable  Synthesis  for  a  Sustainable  Future  

Page 6: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

OpGmum  ReacGon  and  Route  Design      Why  does  synthesis    take  so    long?  

For  moderate  complexity  targets:  Number  of  reac7ons  tried  No.  in  final  synthe7c  route    Can  be  >100  

Key  problem  is  route  selec7on.    Many  reac7ons  do  not  work!        -­‐  Forces  route  changes.    Others  work,  but  require  op7misa7on  to  give  acceptable  yields  

Page 7: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Historicaldata

Targeted study of reactions

Capture of full reaction data at source

Theoretical models

Prediction of reaction

outcomes

Design / selection of

synthetic route

Electronic Laboratory Notebooks

Make the Molecule

Next Generation Reaction PlatformsNational Service

for the Study of Reactions

Rapid Reaction Analysis

Auto-optimisation

Intellegent Fume

Cupboard

High-throughput automated equipment

Smart Laboratory

ORRD Making  Synthesis  Predictable  Smart  Synthesis  by  Design    

Page 8: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Predic7ng  reac7on  outcomes  

How  big  is  reacGon  space?  

“The  size  of  the  chemical  space  that  is  of  interest  to  drug    developers  is  es7mated  to  lie  between  1018  and  10200  compounds”  1060  stable  organic  compounds  with  MW<500  is  figure  oien  given.  

Reac7on  space  is  connec7ons  between  molecules.    10?  x  number  of  molecules?  

Each  reac7on  can  be  carried  out  in  many  ways,  with  many  different  condi7ons  (x  10?)  

FuncGonal  group  approximaGon.  Assumes  that  par7cular  arrangement  of  atoms  will  always  react  the  same  way  independent  of  the  rest  of  the  molecule  

O OHNaBH4, EtOH

Par7ally  true  for  some  reac7ons,  Much  less  so  for  others  (e.g.  skeletal  rearrangements)  

Page 9: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Predic7ng  reac7on  outcomes  

What  do  we  know?    Reaxys,  CASReact,  SPRESI,  PCD…  Perhaps  30  x  106  unique  reac7ons  All  of  chemistry  so  far  –  perhaps  108  reported  reac7ons.  Each  with  some  informa7on  (%  yield,  reagent,  solvent,  temperature,  7me,  work-­‐up…)  Poten4al  to  include  calculated  data  

How  well  can  we  do?    Computer  Aided  Synthesis  design  programs  

 LHASA  (hand  coded  rules)    ARCHEM  (rules  from  database)    ICSynth  (rules  from  database)    Chema7ca    (hand  coded  rules)  

And  many  other  aPempts…  

Challenge  for  Big  Data  –  how  to  do  bePer?  

Sparse  Incomplete  Inconsistent  Error  strewn  

Page 10: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Database:  (PPh3)2PdCl2,  CuI,  NEt3,  80°C,  12h    PublicaGon:  2-­‐methylbut-­‐3-­‐yn-­‐2-­‐ol  (3.21  g,  38.16  mmol)  and  15  mL  NEt3  were  added  to  a  mixture  of  bromobenzene  (5.00  g,  31.8  mmol),  CuI  (181  mg,  954  μmol),PPh3  (996  mg,  3.80  mmol),  and  PdCl2(PPh3)2(1.34  g,  1.90  mmol)  under  nitrogen.  The  reac7on  mixture  was  heated  to  80  °C  for  12  h.  The  reac7on  was  quenched  with  water  (100  mL),  extracted  with  dichloromethane  (DCM)  (3  x  100  mL).  The  combined  organic  phase  was  dried  over  MgSO4,  filtered  off,  and  concentrated  under  reduced  pressure.  The  product  was  obtained  as  a  yellow  oil  by  column  chromatography  (silica,  hexane/ethyl  acetate  (4:1),  Rf=0.33)  (yield:  4.84  g,  95percent).    

     Reality:  Many,  many  variables,  most  not  even  recorded.  Eg  S88  batch  process  descrip7on  may  occupy  80  pages!  

Using  full  experimental  data  

Predic7ng  reac7on  outcomes  

Br

OH

OH

+

Page 11: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

Much  more  data  is  coming!      Electronic  Laboratory  Notebooks,    Automated  experimenta7on,    High  throughput  parallel  experimenta7on.  In  situ  spectroscopy    How  to  make  effec7ve  use  of  the  flood  of  data  produced?  

Predic7ng  reac7on  outcomes  

Using  full  experimental  data  

Page 12: Big$Data$and$the$Dial-a-Molecule$$ GrandChallenge · 2019. 4. 25. · Historical data Targeted study of reactions Capture of full reaction data at source Theoretical models Prediction

More  effec7ve  use  of  exis7ng  data  to  predict  reac7on  outcomes      

Making  use  of  the  flood  of  detailed  data  star7ng  to  be  generated.      

Determine  ‘best’  synthe7c  routes.  

Big  Data  and  the  Dial-­‐a-­‐Molecule    Grand  Challenge  

Acknowledgements.  EPSRC  for  funding  Dial-­‐a-­‐Molecule  coordinators:  Bogdan  Ibanescue,  Susanne  Coles,  Kelly  Kilpin  Dial-­‐a-­‐Molecule  Steering  group.