43
The Impact of Openness on the Quality of Chemistry Data and Implications for Organic Synthesis Jean-Claude Bradley September 29, 2011 Drexel Chemistry Mini Symposium Associate Professor of Chemistry Drexel University

MiniSymp2011 Bradley

Embed Size (px)

DESCRIPTION

Jean-Claude Bradley presents a 15 minute summary of current research in his lab on September 29, 2011 at the Drexel University Department of Chemistry Faculty Mini-Symposium. The main project discussed is the Open Melting Point Collection done in collaboration with Andrew Lang and Antony Williams. Work by Evan Curtin is also shown, demonstrating the application of melting point and solubility in reaction design

Citation preview

Page 1: MiniSymp2011 Bradley

The Impact of Openness on the Quality of Chemistry Data and Implications for Organic

Synthesis

Jean-Claude Bradley

September 29, 2011

Drexel Chemistry Mini Symposium

Associate Professor of ChemistryDrexel University

Page 2: MiniSymp2011 Bradley

The Trusted Source ModelBefore online databases (early 90s) searching for properties like melting

points using ONE “trusted source” was practical and acceptable as part of the

chemistry culture.• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals

Single values don’t tend to be contradicted

Page 3: MiniSymp2011 Bradley

Question Assumptions

Using technology, we can begin to replace the “trusted source”

model with one based on transparency and provenance

Page 4: MiniSymp2011 Bradley

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 5: MiniSymp2011 Bradley

Discovering outliers for melting points (stdev/average)

Page 6: MiniSymp2011 Bradley

Investigating the m.p. inconsistencies of EGCG

Page 7: MiniSymp2011 Bradley

Investigating the m.p. inconsistencies of cyclohexanone

Page 8: MiniSymp2011 Bradley

Most popular data sources

Page 9: MiniSymp2011 Bradley

Alfa Aesar donates melting points to the public

Page 10: MiniSymp2011 Bradley

Open Melting Point Explorer

(Andrew Lang)

Page 11: MiniSymp2011 Bradley

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 12: MiniSymp2011 Bradley

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 13: MiniSymp2011 Bradley

Inconsistencies and SMILES problems within MDPI dataset

Page 14: MiniSymp2011 Bradley

MDPI Dataset labeled with High Trust Level

Page 15: MiniSymp2011 Bradley

Open Melting Point DatasetsCurrently 27,000 mps for 20,000 compounds

Page 16: MiniSymp2011 Bradley

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 17: MiniSymp2011 Bradley

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp and can be frozen <-30C (Evan Curtin)

Page 18: MiniSymp2011 Bradley

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 19: MiniSymp2011 Bradley

Motivation: Faster Science, Better Science

Page 20: MiniSymp2011 Bradley

Ruling out all melting points above -15C?

Page 21: MiniSymp2011 Bradley

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 22: MiniSymp2011 Bradley

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 23: MiniSymp2011 Bradley

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 24: MiniSymp2011 Bradley

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 25: MiniSymp2011 Bradley

Melting point prediction service

Page 26: MiniSymp2011 Bradley

Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)

Page 27: MiniSymp2011 Bradley

Using melting point for temperature dependent solubility prediction

Page 28: MiniSymp2011 Bradley

Web services for summary data

(Andrew Lang)

Page 29: MiniSymp2011 Bradley

Publication of double+ validated melting point dataset to Nature

Precedings and LuLu

Page 30: MiniSymp2011 Bradley
Page 31: MiniSymp2011 Bradley
Page 32: MiniSymp2011 Bradley

Reaction Attempts Book

Page 33: MiniSymp2011 Bradley

Reaction Attempts Book: Reactants listed Alphabetically

Page 34: MiniSymp2011 Bradley
Page 35: MiniSymp2011 Bradley

Google Apps Scripts web services

Page 36: MiniSymp2011 Bradley

Google Apps Scripts for conveniently exploring melting

point data

Page 37: MiniSymp2011 Bradley

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 38: MiniSymp2011 Bradley

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 39: MiniSymp2011 Bradley

Google Apps Scripts for planning reactions and creating schemes

Page 40: MiniSymp2011 Bradley

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 41: MiniSymp2011 Bradley

All ONS web services

Page 42: MiniSymp2011 Bradley

Predicting Best Solvent for Imine Formation

(Evan Curtin)

Page 43: MiniSymp2011 Bradley

Predicting Yield of Imine Formation in Ethanol

(Evan Curtin)