Connecting Chemistry Across the Internet Using ChemSpider

Preview:

Citation preview

Connecting Chemistry Across the Internet Using ChemSpider

Antony J Williams and Valery TkachenkoSERMACS, November 15th 2012

Chemistry Data and the Weeds

Tell me about Roundup

So what is Round Up?

The World’s Encyclopedia

Roundup

Where do we Round Up data?

Where can I find the molfile for Roundup? Papers/Patents about Roundup? What are the side effects of Roundup? Where can I order Roundup? What are the physicochemical properties? Metabolic pathways? Different synonyms of Roundup? Synthesis of Roundup? Side effects of Roundup? Etc….

In an increasing LinkedData map….

But I want to aggregate data? So…

ChemSpider

Takes on the role of a structure centric hub:

Connecting, validating, qualifying data Enhancing data with connections to services Provides access to data and services for others

to use (Thermo, Agilent, Bruker, Waters, ACD/Labs, Accelrys, etc.)

Uses available services to integrate, connect and enhance the offering

Roundup on ChemSpider

What will ChemSpider give us??

What will ChemSpider give us??

What will ChemSpider give us??

What will ChemSpider give us??

What will ChemSpider give us??

What will ChemSpider give us??

ChemSpider is Collapsing Data???

What will ChemSpider give us??

For Glyphosate itself

How did we build it?

We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support

what InChI can handle, good and bad Standardization based on “InChI standardization” InChIs aggregate (certain) tautomers

How much of ChemSpider is “on ChemSpider”?

Connecting Chemistry across the web

So much of what is seen on ChemSpider is retrieved in real time using services

Connecting Chemistry across the web

Online Predictions

A Comment on Quality

For >28 million chemical compounds there are some errors:

“Incorrect” structure representations Mismatched name-structure relationships Experimental properties (the values, the units) Real vs. virtual compounds – text-mining and

conversion

We have deprecated a LOT of data…

Downsides of InChI

Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”

InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…

Side Effects of InChI Usage

SMILES by comparison…

Side Effects of InChI Usage

Standardization IssuesDepiction based on molfile

Downsides of Overall Approach

Meshing data together based on InChIs worked for simple molecules

2D layout errors inherited or limited by algorithm

Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same

So much data online is “erroneous”

The confusion of name-structures

Collapsing Data – Standardization

What needs to happen?

If we could validate Catch errors in databases (and clean) Proactively catch errors in publications/patents Reduce junk in the ether – improve QUALITY!

If we collectively standardized Interlinking between databases should improve

CVSP – a separate presentation….stick around

Crowdsourcing ChemSpider

ChemSpider is crowdsourced

Community deposition, annotation and curation

Anyone can “Leave Feedback”

Registered users can add data

Internet Data

ChemSpider and Global Chemistry Hub

Commercial SoftwarePre-competitive Data

Open ScienceOpen DataPublishersEducators

Open DatabasesChemical Vendors

Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals

Delivering a Prediction Platform Experimental data will be used as the basis of

model generation – a predictive platform…

The Future of ChemSpider Continued focus on quality over quantity –

but more data is good too! ChemSpider Reactions – work in progress

and includes >300,000 reactions Plugging in a validation and standardization

platform Delivering personal and institutional

repository capabilities

Thank you

Email: williamsa@rsc.org Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Recommended