Crawling Across the Web of Chemistry Using ChemSpider

Preview:

DESCRIPTION

ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed to index available sources of chemical structures and their associated data into a single searchable repository and making it available to everybody, at no charge. While there are a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness is severely lacking. ChemSpider has provided a platform so that the chemistry community could contribute to improving the quality of data online and expanding the information to include data such as reaction syntheses, analytical data, experimental properties and linkages to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources. This presentation will provide an overview of ChemSpider and its value to chemists as a search tool, as a public repository of information and how it can become one of the primary foundations of internet-based chemistry. I will also discuss the vision for ChemSpider and some of the lofty goals we are setting for the system moving forward.

Citation preview

Crawling Across the Web of Chemistry Using ChemSpider

Citizen Scientists Enable the Web

Who is writing about chemical compounds on Wikipedia?

Who is writing critical reviews of Chemistry online?

Who is blogging about chemistry on the web?

For Synthesis…TotallySynthetic.com

Org Prep Daily (Blog)

Molbank (Open Access Journal)

Synthetic Pages (Website)

Encyclopedic Articles (Wikipedia)

Chemistry online – An Overview Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Chemical Synthesis procedures Scientific publications Chemical vendors Blogs Wikis Open Notebook Science

What and who do you trust?

Compounds and Identifiers

What is ChemSpider? ChemSpider is:

Building a Structure Centric Community for Chemists >23 million compounds, ca. 250 data sources

A deposition and curation platform

A publishing platform for the community

Grows daily – more depositions, more links, more data sources

Search Cholesterol

Search Cholesterol

Search Cholesterol

Search Cholesterol

Search Cholesterol

Linked across the internet

Link off a structure in ChemSpider

Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Linked to Millions of Articles

Answering Questions for Chemists

Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

What is the structure of Flibanserin?

What is the structure of Flibanserin?

Complex Data and Information

Various Searches

Structure searching

Substructure searching

Subset searching – choose from 200 data sources

Property searching

Searches are used in various ways by different types of chemists…

ChemSpider Searches

ChemSpider Searches

Caution! Question Everything!

Vancomycin

Who will curate?

PubChem is not resourced to clean these errors

How would you clean such a large dataset?

Vancomycin on ChemSpider 1 compound – discussions over 3 days

The EXPERTS must get it right?!

Wikipedia, C&E News, PubChem C&E News (from ACS)

“Lathosterol”

“Lathosterol”

“Lathosterol”

“Lathosterol” Removed

“Lathosterol” on PubChem

Crowd-sourcing Chemistry Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Citizen Scientists

Become a Data Source

Synthesis Procedures

Links to Data or Deposit Data

Your Blog Posted Online?

Upload Spectral Data, OPEN Data?

Data as DOIs

Primary Data for Chemistry Available for the First Time

…Thieme is the first publisher to make primary chemistry data accessible worldwide

Analytical data, from various experiments, is the foundation of research work and scientific papers

From now on, primary data will be registered and made available online using digital object recognition in the form of Digital Object Identifiers (DOI)

Linking Data By DOI

Semantic Mark-up for Chemistry

Semantic mark-up for chemistry is here

RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies). Based on the OSCAR system

ChemSpider Journal of Chemistry

Nature publishing group compound linking

ChemSpider and Publishing

Curation led to a set of validated dictionaries

Integrated entity extraction with validated name dictionaries

Additional dictionaries gave reactions, groups, families, hardware and software vendors etc

ChemMantis and CJOC

Name-Structure Pairs

Deposit Structures

Species – linked to Wikipedia

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

RSC’s Project Prospect

In Development ChemSpider Synthesis

ChemSpider Synthesis will be a home for all things “synthetic”

An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.

Public peer-review and feedback for synthetic procedures

RSC Supplementary Info

Online Journals and Live Data

ChemSpider Everywhere : Embed

ChemSpider Everywhere: Spectral Game

ChemSpider EverywhereCrowdsourced Curation of Spectra

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemMobi

ChemSpider Web Services

ChemSpider Everywhere Linked from Wikipedia

Linked from Open Notebook Science sites

Linked from Blogs using Structure/Spectra

Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets

Where is ChemSpider Lacking?

ChemSpider is limited to “defined chemicals”. No support for: Polymers Minerals Markush structures

ChemSpider is very dependent on InChIs Stereochemistry around non-carbon centers Organometallics are not correctly represented

There are millions of errors on ChemSpider

What’s next? Keep cleaning and depositing data

Enable discovery via the semantic web (RDF)

Integrate software: Symyx Jdraw, NMRShiftDB

Integrate RSC content – a massive archive!

Integrate RSC publishing workflows and databases

Continue Building Community for Chemistry

Building a Public ADME/Tox database

Delivering ChemSpider Synthetic Pages

Delivering ChemSpider Analytical Data

Delivering ChemSpider Education

Project Focus

People Make Change HappenYou are invited.. Curate ChemSpider data and link to us

Deposit your data with us Structures Spectra Synthesis procedures

ChemSpider Synthesis is under development

People Make Change Happen ChemSpider was a “hobby project”

Housed in a basement and running off three servers – one bought, two built

Sensitive to weather and power stability

Went live at ACS Spring 2007 in Chicago

ca. 6000 visitors a day, >50,000 transactions daily

Organizations Scale Innovation

There is a Downside…

There is a Downside…

Thank you

antony.williams@chemspider.comTwitter: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams

Recommended