87
Taming the Wild, Wild West of Chemistry on the Internet. Maybe YOU Can Help?

Taming The Wild West Of Internet Based Chemistry You Can Help

Embed Size (px)

DESCRIPTION

I am an adjunct prof at University of North Carolina Chapel Hill so when I stopped by yesterday for a business meeting I was informed that I had been lined up to give a talk to the students at 1pm. I had 20 minutes to prepare and assembled a mish-mash of information that might be of value to Citizen Chemists, those who might want to contribute to chemistry on the internet

Citation preview

Page 1: Taming The Wild West Of Internet Based Chemistry You Can Help

Taming the Wild, Wild West of Chemistry on the Internet. MaybeYOU Can Help?

Page 2: Taming The Wild West Of Internet Based Chemistry You Can Help

Citizen Scientists Enable the Web

Who is writing about chemical compounds on Wikipedia?

Who is writing critical reviews of Chemistry online?

Who is blogging about chemistry on the web?

Page 3: Taming The Wild West Of Internet Based Chemistry You Can Help

For Synthesis…TotallySynthetic.com

Page 4: Taming The Wild West Of Internet Based Chemistry You Can Help

Org Prep Daily (Blog)

Page 5: Taming The Wild West Of Internet Based Chemistry You Can Help

Molbank (Open Access Journal)

Page 6: Taming The Wild West Of Internet Based Chemistry You Can Help

Synthetic Pages (Website)

Page 7: Taming The Wild West Of Internet Based Chemistry You Can Help

Encyclopedic Articles (Wikipedia)

Page 8: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 9: Taming The Wild West Of Internet Based Chemistry You Can Help

Chemistry online – An Overview Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Chemical Synthesis procedures Scientific publications Chemical vendors Blogs Wikis Open Notebook Science

Page 10: Taming The Wild West Of Internet Based Chemistry You Can Help

What and who do you trust?

Page 11: Taming The Wild West Of Internet Based Chemistry You Can Help

Compounds and Identifiers

Page 12: Taming The Wild West Of Internet Based Chemistry You Can Help

What is ChemSpider? ChemSpider is:

Building a Structure Centric Community for Chemists >23 million compounds, ca. 250 data sources

A deposition and curation platform

A publishing platform for the community

Grows daily – more depositions, more links, more data sources

Page 13: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Cholesterol

Page 14: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Cholesterol

Page 15: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Cholesterol

Page 16: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Cholesterol

Page 17: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Cholesterol

Page 18: Taming The Wild West Of Internet Based Chemistry You Can Help

Linked across the internet

Page 19: Taming The Wild West Of Internet Based Chemistry You Can Help

Link off a structure in ChemSpider

Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 20: Taming The Wild West Of Internet Based Chemistry You Can Help

Linked to Millions of Articles

Page 21: Taming The Wild West Of Internet Based Chemistry You Can Help

Answering Questions for Chemists

Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 22: Taming The Wild West Of Internet Based Chemistry You Can Help

What is the structure of Flibanserin?

Page 23: Taming The Wild West Of Internet Based Chemistry You Can Help

What is the structure of Flibanserin?

Page 24: Taming The Wild West Of Internet Based Chemistry You Can Help

Complex Data and Information

Page 25: Taming The Wild West Of Internet Based Chemistry You Can Help

Various Searches

Structure searching

Substructure searching

Subset searching – choose from 200 data sources

Property searching

Searches are used in various ways by different types of chemists…

Page 26: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider Searches

Page 27: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider Searches

Page 28: Taming The Wild West Of Internet Based Chemistry You Can Help

Antony Williams vs Identifiers

Passport ID

Dad, Tony, others

SSN

Green Card

License5 email addressesChemSpiderman (blog, Twitter account, Facebook, Friendfeed)OpenID….

Page 29: Taming The Wild West Of Internet Based Chemistry You Can Help

Aspirin vs Chemical Identifiers

Page 30: Taming The Wild West Of Internet Based Chemistry You Can Help

Aspirin names and synonyms

• Text searches depend on correct association

• 335 suggested identifiers for Aspirin just on PubChem!

• Disambiguation dictionaries are necessary

Page 31: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 32: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 33: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 34: Taming The Wild West Of Internet Based Chemistry You Can Help

The Final Search Strategy

Page 35: Taming The Wild West Of Internet Based Chemistry You Can Help

All Those Names, One Structure

Page 36: Taming The Wild West Of Internet Based Chemistry You Can Help

Connections Can Lead Anywhere

Page 37: Taming The Wild West Of Internet Based Chemistry You Can Help

The InChI Identifier

Page 38: Taming The Wild West Of Internet Based Chemistry You Can Help

Multiple Layers

Page 39: Taming The Wild West Of Internet Based Chemistry You Can Help

InChIStrings Hash to InChIKeys

Page 40: Taming The Wild West Of Internet Based Chemistry You Can Help

Oleoylethanolamine

InChI=1S/C20H39NO2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-20(23)21-18-19-22/h9-10,22H,2-8,11-19H2,1H3,(H,21,23)/b10-9-

BOWVQLFMWHZBEF-KTKRTIGZSA-N

Page 41: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Engine Dependencies

Page 42: Taming The Wild West Of Internet Based Chemistry You Can Help

Search Engine Dependencies

Page 43: Taming The Wild West Of Internet Based Chemistry You Can Help

Vancomycin

Page 44: Taming The Wild West Of Internet Based Chemistry You Can Help

Vancomycin

Who will curate?

How would you clean such a large dataset?

Page 45: Taming The Wild West Of Internet Based Chemistry You Can Help

Chemistry on the Internet

Much of the information is based on assertions and User Beware!

The Quality of information available is diverse and how does the user know what is and is not “correct”?

Page 46: Taming The Wild West Of Internet Based Chemistry You Can Help

Caution! Question Everything!

Page 47: Taming The Wild West Of Internet Based Chemistry You Can Help

Question Everything online: www.dhmo.org

Page 48: Taming The Wild West Of Internet Based Chemistry You Can Help

Vancomycin on ChemSpider

Page 49: Taming The Wild West Of Internet Based Chemistry You Can Help

Vancomycin

Page 50: Taming The Wild West Of Internet Based Chemistry You Can Help

Vancomycin

Search Molecular SKELETON

Search Full Molecule

Page 51: Taming The Wild West Of Internet Based Chemistry You Can Help

Full Skeleton Search: 104 Hits

Page 52: Taming The Wild West Of Internet Based Chemistry You Can Help

Full Molecule Search: 4 Hits

Page 53: Taming The Wild West Of Internet Based Chemistry You Can Help

The EXPERTS must get it right?!

Page 54: Taming The Wild West Of Internet Based Chemistry You Can Help

Wikipedia, C&E News, PubChem C&E News (from ACS)

Page 55: Taming The Wild West Of Internet Based Chemistry You Can Help

“Lathosterol”

Page 56: Taming The Wild West Of Internet Based Chemistry You Can Help

“Lathosterol”

Page 57: Taming The Wild West Of Internet Based Chemistry You Can Help

“Lathosterol”

Page 58: Taming The Wild West Of Internet Based Chemistry You Can Help

“Lathosterol” Removed

Page 59: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 60: Taming The Wild West Of Internet Based Chemistry You Can Help

“Lathosterol” on PubChem

Page 61: Taming The Wild West Of Internet Based Chemistry You Can Help

Crowd-sourcing Chemistry Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 62: Taming The Wild West Of Internet Based Chemistry You Can Help

Citizen Scientists

Page 63: Taming The Wild West Of Internet Based Chemistry You Can Help

Become a Data Source

Page 64: Taming The Wild West Of Internet Based Chemistry You Can Help
Page 65: Taming The Wild West Of Internet Based Chemistry You Can Help

Synthesis Procedures

Page 66: Taming The Wild West Of Internet Based Chemistry You Can Help

Links to Data or Deposit Data

Page 67: Taming The Wild West Of Internet Based Chemistry You Can Help

Your Blog Posted Online?

Page 68: Taming The Wild West Of Internet Based Chemistry You Can Help

Upload Spectral Data, OPEN Data?

Page 69: Taming The Wild West Of Internet Based Chemistry You Can Help

Semantic Mark-up for Chemistry

Semantic mark-up for chemistry is here

RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies). Based on the OSCAR system

ChemSpider Journal of Chemistry

Nature publishing group compound linking

Page 70: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemMantis and CJOC

Page 71: Taming The Wild West Of Internet Based Chemistry You Can Help

Name-Structure Pairs

Page 72: Taming The Wild West Of Internet Based Chemistry You Can Help

Deposit Structures

Page 73: Taming The Wild West Of Internet Based Chemistry You Can Help

Species – linked to Wikipedia

Page 74: Taming The Wild West Of Internet Based Chemistry You Can Help

In Development ChemSpider Synthesis

ChemSpider Synthesis will be a home for all things “synthetic”

An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.

Public peer-review and feedback for synthetic procedures

Page 75: Taming The Wild West Of Internet Based Chemistry You Can Help

Online Journals and Live Data

Page 76: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider Everywhere : Embed

Page 77: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider Everywhere: Spectral Game

Page 78: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider EverywhereCrowdsourced Curation of Spectra

Page 79: Taming The Wild West Of Internet Based Chemistry You Can Help

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemMobi

Page 80: Taming The Wild West Of Internet Based Chemistry You Can Help

ChemSpider Everywhere Linked from Wikipedia

Linked from Open Notebook Science sites

Linked from Blogs using Structure/Spectra

Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets

Page 81: Taming The Wild West Of Internet Based Chemistry You Can Help

Where is ChemSpider Lacking?

ChemSpider is limited to “defined chemicals”. No support for: Polymers Minerals Markush structures

ChemSpider is very dependent on InChIs Stereochemistry around non-carbon centers Organometallics are not correctly represented

There are millions of errors on ChemSpider

Page 82: Taming The Wild West Of Internet Based Chemistry You Can Help

What’s next? Keep cleaning and depositing data

Enable discovery via the semantic web (RDF)

Integrate software: Symyx Jdraw, NMRShiftDB

Integrate RSC content – a massive archive!

Integrate RSC publishing workflows and databases

Page 83: Taming The Wild West Of Internet Based Chemistry You Can Help

Continue Building Community for Chemistry

Building a Public ADME/Tox database

Delivering ChemSpider Synthetic Pages

Delivering ChemSpider Analytical Data

Delivering ChemSpider Education

Project Focus

Page 84: Taming The Wild West Of Internet Based Chemistry You Can Help

People Make Change HappenYou are invited.. Curate ChemSpider data and link to us

Deposit your data with us Structures Spectra Synthesis procedures

ChemSpider Synthesis is under development

Page 85: Taming The Wild West Of Internet Based Chemistry You Can Help

People Make Change Happen ChemSpider was a “hobby project”

Housed in a basement and running off three servers – one bought, two built

Sensitive to weather and power stability

Went live at ACS Spring 2007 in Chicago

ca. 6000 visitors a day, >50,000 transactions daily

Page 86: Taming The Wild West Of Internet Based Chemistry You Can Help

Organizations Scale Innovation

Page 87: Taming The Wild West Of Internet Based Chemistry You Can Help

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams