Transcript

Julio E. Peironcely @peyron

Juliopeironcely.com

PhD student at Leiden University and TNO

Understanding And Classifying Metabolite Space and Metabolite-Likeness PLoS One (in press)

Metabolomics

the quantitative and qualitative analysis of all metabolites in

samples of cells, body fluids, tissues, etc.

Julio E. Peironcely

Metabolomics

Julio E. Peironcely

Biological question

Sample preparation

Experi- mental design

Data acquisition

Data pre- processing

Biological inter-

pretation

Data analysis

Samples Raw data List of peaks/ biomolecules

Relevant biomolecules/ connectivities

& Models

Metabolites

Sampling

Protocol

Metabolomics

Julio E. Peironcely

Biological question

Sample preparation

Experi- mental design

Data acquisition

Data pre- processing

Biological inter-

pretation

Data analysis

Samples Raw data List of peaks/ biomolecules

Relevant biomolecules/ connectivities

& Models

Metabolites

Sampling

Protocol

How do metabolites look like?

HMDB 8K

ZINC 21M

Julio E. Peironcely

metabolites non metabolites

Water Solubility MW

C Atoms Struc. Complexity

PSA

Julio E. Peironcely

PCA

Julio E. Peironcely

PCA

Not so different

Decision Tree

Julio E. Peironcely

Lots of candidates structures

Elemental Composition

Julio E. Peironcely

Elemental Composition

Structure Generation

Julio E. Peironcely

Elemental Composition

Structure Generation

Molecules

Julio E. Peironcely

We are looking for metabolites

Elemental Composition

Structure Generation

Molecules

Metabolite Likeness

Julio E. Peironcely

Elemental Composition

Structure Generation

Molecules

Metabolite Likeness

Metabolites

Julio E. Peironcely

Metabolite-likeness

Julio E. Peironcely

HMDB 8K

ZINC 21M

Atom Counts

Physicochemical desc.

MDL Public Keys

FCFP_4

ECFP_4

Support Vector Machines (SVM)

Random Forest (RF)

Naïve Bayes (NB)

Representation + Classification

Metabolite-likeness

Julio E. Peironcely

HMDB 8K

ZINC 21M

Standardization

Diversity Selection Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

3 classifiers X

5 descriptions

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

Best = RF – MDLPublicKeys

Sensitivity Specificity AUC

99.84% 87.52% 99.20%

Bad BC – P_desc

Sensitivity Specificity AUC

42.51% 86.56% 61.57%

Metabolite-likeness, external validation

Julio E. Peironcely

HMDB External

validation set ChEMBL

Metabolite likeness

DrugBank

Standardization

Random Selection

Metabolite-likeness, external validation

Julio E. Peironcely

Met-likeness + structure generation (methylhistamine) 260K

Julio E. Peironcely

46% 71%

Met-likeness + structure generation (malic acid) 8K

Julio E. Peironcely

100%

57% 77%

Conclusions

Julio E. Peironcely

Prediction is good, interpretation not

Useful in different fields

Local models needed

Acknowledgements

TNO Quality of Life

Leon Coulier

HMP University of Alberta

David Wishart Ying (Edison) Dong

Leiden University

Theo Reijmers Thomas Hankemeier

University of Cambridge

Andreas Bender

Julio E. Peironcely