18
Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  1                 07:05:37 Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction Stefan Kuhn 1 Björn Egert 2 , Steffen Neumann 2 , Christoph Steinbeck 1 European Bioinformatics Institute (EBI), Chemoinformatics and Metabolism Team, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, United Kingdom 2 Research Group for Molecular Informatics, Cologne University Bioinformatics Center (CUBIC), Zuelpicher Str. 47, D-50674 Cologne, Germany, [email protected],

Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  1                 07:05:37

Building blocks for automated elucidation of metabolites:

Machine learning methods for NMR prediction

Stefan Kuhn1, Björn Egert2, Steffen Neumann2, Christoph Steinbeck

1European Bioinformatics Institute (EBI), Chemoinformatics and Metabolism Team, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, United Kingdom

2Research Group for Molecular Informatics, Cologne University Bioinformatics Center (CUBIC), Zuelpicher Str. 47, D­50674 Cologne, Germany, [email protected],

Page 2: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  2                 07:05:37

Metabolomics @ CUBIC

• Experiment:

•Fast quenching of metabolism

•Cell lysis and extraction

•Derivation

•Detection via GC/MS

2 4 6 8 10 120

200000

400000

600000

Trehalose

GlutamatLactatS

igna

linte

nsit

ä t

t [min]

• Ca. 1000 compounds visible in GC

• 400 derivatives can be reproducibly 

quantified

• 240 compounds identified

Page 3: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  3                 07:05:37

156.11

73.07

245.19

347.20

Procedure:

Extraction of bacterial cells with methanol

Derivatisation

Separation of compounds by gas chromatography

Analysis by mass­spectrometry after electron impact ionization

Gas chromatography (GC) 

Massspectrometer

Metabolomics @ CUBIC

Mass spectrometry (MS)

Page 4: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  4                 07:05:38

De­novo Elucidation of Biomarkers and Metabolites:Computer­Assisted Structure Elucidation (CASE)

Page 5: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  5                 07:05:38

•Java library for chemoinformatics, 

•Open Source, LGPL (permits commercial use)

•>50 developers, core team 10­20 people

•>50 academic and industrial projects world­wide

The Chemistry Development Kit (CDK)

Page 6: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  6                 07:05:38

CDK Functionality

•I/O (CML, MDL Molfile, SDF, PDB) •SMILES •InChI

Input/Output•Structure­Diagram­Layout (SDG)•2D Rendering•3D Rendering

Visualization

•3D Model­Builder •Atom­Typing•Force­Field•Representation of Biomolecular Structures

Modelling

•Isomorphism detection•Maximum­Common­Substructure Searches•SMARTS­ and Substructure searches•Ring searches•Aromaticity detection

Chemical Graphs

•Deterministic Isomer generator•Stochastic Structure Generators via 

­Simulated Annealing­Genetic Algorithms

Library Enumeration

•Fingerprinting•> 70 QSAR­Descriptors•QSAR model building

Properties

Page 7: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  7                 07:05:38

Characterizing Biomarkers and Metabolites

NMRShiftDB (http://www.nmrshiftdb.org)

[1] Steinbeck, C.; Kuhn, S.; Krause, S., J. Chem. Inf. Comput. Sci. 2003, 43, 1733 ­ 1739. [2] Steinbeck, C.; Kuhn, S.  Phytochemistry 2004, 65, 2711­2717.

21500

25000 Open AccessOpen SubmissionOpen Source

Page 8: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  8                 07:05:39

2D NMR Data for CASE

Steinbeck, C. Computer­Assisted Structure Elucidation. In Handbook on Chemoinformatics.; Gasteiger, J. Ed.; Wiley­VCH: Weinheim, 2003; Vol. 2; pp. 1378­1406.

Page 9: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  9                 07:05:39

H O

O H

Polycarpol (C30H48O2).

CASE with Simulated Annealing

Steinbeck, C.;  Journal of Chemical Information & Computer Sciences 2001, 41, 1500­1507.

Fitness Evaluation (Scoring)

Stotal  =  SNMR­HMBC  +  SNMR­HHCOSY  + SNMR­Shift  +  SSymmetry  + SMassSpec... + SFeatures

Page 10: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  10                 07:05:39

How far do we get with 1D NMR?

Page 11: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  11                 07:05:40

Deterministic Structure Generators work ...

... quite nicely for small molecules even with very simple fitness functions

● For around 10 heavy atoms, we've been able to find the correct solutions just based on 13C shift prediction and comparison with measured spectrum.

Page 12: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  12                 07:05:40

Methods trained based on CDK descriptors (random order)

• J48

• HOSE codes

• Support Vector Machines

• M5'

• PRISM

• naïve Bayes

• Linear Regression

• K­Means Clustering

1D Proton NMR Prediction

Page 13: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  13                 07:05:40

Page 14: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  14                 07:05:40

Descriptors(416/100%)

Spatial(105/25,24%)

Physicochemical(242/57,93%)

Exp. Conditions (3/0.72%)

Topological(66/15,86%)

RDF GH,G

D [9]

Van der Waals [11]

Valence Electrons[11]

Electronegativity [9]

Sigma Pi

Period [11]

Hybrization [11]

RDF GS[9]

Distance [11]

Heavy Atom

Hydrogen

Min Avg

RDF GHtopol[9]

Picontact [11]

BondsToAtom [11]

Charge [9]

Sigma Pi

TemperatureFrequency

Solvent

330 descriptors in total

Page 15: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  15                 07:05:41

Page 16: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  16                 07:05:41

Random Forest, real vs predicted, 18672 protons

Page 17: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  17                 07:05:41

Kuhn S., Egert B., Neumann S. and Steinbeck C. (2008) BMC Bioinformatics. 2008 Sep 25;9(1):400.

Page 18: Building blocks for automated elucidation of metabolites ...acscinf.org/docs/meetings/237nm/presentations/237nm80.pdf · Building blocks for automated elucidation of metabolites:

Christoph Steinbeck European Bioinformatics Institute (EBI) Slide  18                 07:05:42

Acknowledgement

Stefan Kuhn

Steffen Neumann

Bjlörn Egert

Egon Willighagen 

All Collaborators at 

Cologne University Bioinformatics Center (CUBIC), 

EBI

and the CDK team

Prof. Peter Murray­Rust (Unilever Center for Molecular Informatics, Cambridge, UK)

Dr. William Hull, Dr. Willi von der Lieth

(DKFZ, Heidelberg)

Dr. Kämpchen

(Universität Marburg)

Dr. Heinz Kolshorn

(Universität Mainz)

DFG, BMBF, DAAD

Roche Diagnostics, Penzberg

Orion Pharma, Finnland