Upload
jean-claude-bradley
View
1.048
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Jean-Claude Bradley presents at the Special Libraries Association meeting on June 14, 2011 on the "International Year of Chemistry: Perils and Promises of Modern Communication in the Sciences- The Role of Trust". The talk mainly covers the problems with a trusted source based model for melting point data and demonstrates that an Open Data model including Open Notebook Science when necessary can be very helpful in curating datasets. Web services for experimental and predicted melting points are then reviewed.
Citation preview
International Year of Chemistry: Perils and Promises of Modern Communication
in the Sciences The Role of Trust
June 14, 2011
Special Libraries Association
Jean-Claude Bradley
Department of ChemistryDrexel University
Unknown Perils of the Past
Before online databases (early 90s) searching for properties like melting
points using ONE “trusted source” was practical
• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals
Known Perils of the Present
Today, many librarians discourage the use of new online sources (like Wikipedia) for the
searching of chemical data and recommend using only “trusted sources”
The problem is that the “trusted source” model is - and always was – fundamentally
flawed.
Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based
on “trusted sources”!
Promises for the Future
Using technology, we can begin to replace the “trusted source”
model with one based on transparency and provenance
The current state of transparency in scientific communication
Case study of melting point data
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
The Chemical Information Validation Explorer
(Andrew Lang)
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer
(Andrew Lang)
OutliersMDPI
datasetEPI (donated all data to public
also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
Live curation on a public Google Spreadsheet of compounds with highest mp ranges
(collaboration with Andrew Lang and Antony Williams)
Some melting points can’t be resolved only with literature: 4-benzyltoluene
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp
and can be frozen <-30C
The quest to resolve the melting point of 4-benzyltoluene: ambiguous results upon heating
but clearly remains a liquid at -15 C for 2 days in freezer
Further investigation into the literature for the melting point of 4-benzyltoluene
Although a general description of method is provided the raw data are
not
Because of broken provenance errors cascade through the literature
Calculations in patent based on incorrect data
Open Random Forest modeling of Open Melting Point data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)
Using melting point for temperature dependent solubility prediction
Motivation: Faster Science, Better Science
There are NO FACTS, only measurements embedded
within assumptions
Open Notebook Science maintains the integrity of data
provenance by making assumptions explicit
TRUST
PROOF
First record then abstract structure
In order to be discoverable use Google friendly formats (simple HTML, no login)
In order to be replicable use free hosted tools (Wikispaces, Google Spreadsheets)
Strategy for an Open Notebook:
Crowdsourcing Solubility Data
Data provenance: From Wikipedia to…
…the lab notebook and raw data
Calculations Made Public on Google Spreadsheets
Interactive NMR spectra using JSpecView and JCAMP-DX
Raw Data As Images
Splatter?
Some liquid
YouTube for demonstrating experimental set-up
Solubilities collected in a Google Spreadsheet
Rajarshi Guha’s Live Web Query using Google Viz API
Web services for summary data
(Andrew Lang)
Web service calls from within a Google Spreadsheet for solubility measurement and
prediction
(Andrew Lang)
Integration of Multiple Web Services to Recommend Solvents for Reactions
(Andrew Lang)
Reaction Attempts Book
Reaction Attempts Book: Reactants listed Alphabetically
ONS Challenge Solubility Book cited for nanotechnology application
Lulu.com Data Disks
All ONS web services
For all Formats of ONS Projects
Conclusions
• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
•Open Notebook Science offers an efficient way to make research transparent and discoverable