37
How can the International Chemical Identifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012

How can the international chemical identifier (InChI) be extended to non trivial chemicals

Embed Size (px)

Citation preview

Page 1: How can the international chemical identifier (InChI) be extended to non trivial chemicals

How can the International Chemical

Identifier (InChI) be extended to non-

trivial chemicals?

of the pillars of aV. Tkachenko, A.J. Williams,

Y. Borodina, F. Switzer, T. Peryea, L. Callahan

ACS Philly August 2012

Page 2: How can the international chemical identifier (InChI) be extended to non trivial chemicals

What is InChI

Page 3: How can the international chemical identifier (InChI) be extended to non trivial chemicals

InChI Examples

CH3CH2OH

ethanolInChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3

L-ascorbic acid

InChI=1S/C6H8O6/c7-1-2(8)5-

3(9)4(10)6(11)12-5/h2,5,7-8,10-

11H,1H2/t2-,5+/m0/s1

Page 4: How can the international chemical identifier (InChI) be extended to non trivial chemicals

InChI Structure

Page 5: How can the international chemical identifier (InChI) be extended to non trivial chemicals

InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the

SHA-256 algorithm)

Designed to allow for easy web searches of chemical compounds

InChIKeys consist of

14 characters resulting from a hash of the connectivity information of the InChI

followed by 9 characters resulting from a hash of the remaining layers of the InChI

followed by a single character indication the version of InChI used

followed by single checksum character

InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-

11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1

BQJCRHHNABKAKU-KBQPJGBKSA-N

Unlike InChI, InChIKey CT only by lookup

Page 6: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Proliferation of InChI

Page 7: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Search by InChI

Page 8: How can the international chemical identifier (InChI) be extended to non trivial chemicals

ChemSpider Google Searchhttp://www.chemspider.com/google/

Page 9: How can the international chemical identifier (InChI) be extended to non trivial chemicals

What’s the catch?

InChI has limitations

InChI is ideal for

Simple

Static

Well-defined graphs

Real chemical substances can only be

approximated by such graphs

Page 10: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Limitations Non-trivial stereo (e.g. axial, planar)

Non-trivial tautomers (e.g. ring-chain)

Mixtures – full stereo is rarely known

Polymers

Markush structures

Organometalics

Inorganics

Materials

Reactions

Etc

Page 11: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Chemical data complexity

Page 12: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Work in progress

InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:

Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.

Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.

Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.

But what do we do NOW???

Page 13: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Deposition Process

Non-

redundant

data

Data

Va

lida

tio

n

Sta

nd

ard

iza

tio

n

Filt

eri

ng

De

du

plic

atio

n

Co

mp

on

en

tiza

tio

n

Mappin

g

Page 14: How can the international chemical identifier (InChI) be extended to non trivial chemicals

ChemSpider Data Model

Page 15: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Organometallics

Page 16: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Mixtures or unknown stereo

Page 17: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Accelrys Enhanced Stereo

Page 18: How can the international chemical identifier (InChI) be extended to non trivial chemicals

MOL V3000

Page 19: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Enhanced stereo and InChI…

Unfortunately not supported

Is it important?

Now real-world examples…

Page 20: How can the international chemical identifier (InChI) be extended to non trivial chemicals

FDA Substance Registration System

Page 21: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Stoichiometric and non-stoichiometric mixtures

Moiety 1:

Moiety 2:

Substance:

Page 22: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Moiety 1:

Moiety 2:

Moiety 3:

Moiety 4:

Substance:

Page 23: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Substance: Moiety 1:

Moiety 2:(undefined)

Page 24: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Substance:

Moiety 1:

Moiety 2:

(A)

(B)

Page 25: How can the international chemical identifier (InChI) be extended to non trivial chemicals

D-glucose

Page 26: How can the international chemical identifier (InChI) be extended to non trivial chemicals

SRS standardization approach

Substance description Standardization module Moieties generator Normalization InChI[Key] generator

Hash function f(InChIKeys, moieties)

Unique ID Standard description

Page 27: How can the international chemical identifier (InChI) be extended to non trivial chemicals

SRS TBD

Markush

Polymers

Proteins

Inorganics

Materials

Page 28: How can the international chemical identifier (InChI) be extended to non trivial chemicals

OpenPHACTS

Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project

To reduce the barriers to drug discovery in industry, academia and for small businesses

To build an open platform, integrating chemistry and biology data from public domain resources

Semantic web platform

Open Standards, Open Data and Open Source

Page 29: How can the international chemical identifier (InChI) be extended to non trivial chemicals
Page 30: How can the international chemical identifier (InChI) be extended to non trivial chemicals
Page 31: How can the international chemical identifier (InChI) be extended to non trivial chemicals

OpenPHACTS specifics

Active/inactive ingredient

Parent/child

Sample/substance

Misreferences (!!!)

Page 32: How can the international chemical identifier (InChI) be extended to non trivial chemicals

ChemSpider Reactions

Page 33: How can the international chemical identifier (InChI) be extended to non trivial chemicals
Page 34: How can the international chemical identifier (InChI) be extended to non trivial chemicals

ChemSpider Reaction Challenges

Deduplication

Identification

Deposition

Page 35: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Conclusions

InChI is The Identifier

InChI has its limitations

InChI is work in progress

InChI deficiencies can be hot-fixed

Page 36: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Acknowledgements

RSC Cheminformatics group

FDA SRS group

OpenPHACTS consortium

Software: InChI, GGA Software

Page 37: How can the international chemical identifier (InChI) be extended to non trivial chemicals

Thank you

Email: [email protected]

Blog: www.chemspider.com/blog

SLIDES:

http://www.slideshare.net/valerytkachenko16