1 Metabolomics a Promising ‘omics Science By Susan Simmons University of North Carolina Wilmington

Metabolomics a Promising ‘omics Science

By Susan Simmons

University of North Carolina Wilmington

Collaborators

Dr. David Banks, Duke Dr. Chris Beecher, University of Michigan Dr. Xiaodong Lin, University of Cincinnati Dr. Young Truong, UNC Dr. Jackie Hughes-Oliver, NC State Dr. Stanley Young, NISS Dr. Ann Stapleton, UNCW Biology Dr. Robert Simmons, MD

What is Metabolomics?

The word metabolome was first used less than a decade ago (1998) and referred to all low molecular mass compounds synthesized and modified by a living cell or organism (Villas-Boas, 2007)

The complete human metabolome consists of endogenous (~1800) and exogenous metabolites (MANY!!)

Human Metabolome Project

Fluorene degradation - Reference pathway (www.genome.jp/KEGG

Kyoto Encyclopedia of Genes and Genomes)

Mass Distribution of Compounds in the Human Metabolome

0 200 400 600 800 1000 1200 1400 1600 1800

Series1

Metabolome natively biosynthesized monomeric

Complex metabolites Xenobiome

History of Metabolomics

Machinery to detect metabolites have existed since the late 1960’s

First paper appeared in 1971 (Robinson and Pauling)

First paper involving “metabolomics” came about in the late 1990’s

Why Metabolomics can be promising

Easy to use screening for disease Assist in identifying gene function Drug discovery Assessment of toxicity (especially liver

toxicity) in new drugs. Nutrigenomics and diet strategies

Genomics,Proteomics and Metabolomics

1990 1992 1994 1996 1998 2000 2002 2004 2006

Genom*Proteom*Metabolom*

The emerging science of Metabolomics

1998 1999 2000 2001 2002 2003 2004 2005 2006

2 2 7 15

1998 1999 2000 2001 2002 2003 2004 2005 2006

2 2 7 15

Metabolomics

Protein

Biochemicals (Metabolites)

Genomics – 25,000 Genes

Transcriptomics – 100,000 Transcripts

Metabolomics – 1,800 Compounds

Proteomics – 1,000,000

Proteins

O CHCH3

Biochemical Profile Map to Metabolic Pathways

Biochemical Profile

Data Collection and Measurement Issues

To obtain data, a tissue sample is taken from a patient. Then:

The sample is prepped and put onto wells on a silicon plate.

Each well’s aliquot is subjected to gas and/or liquid chromatography.

After separation, the sample goes to a mass spectrometer.

MS platforms

SamplePreparation

GC MS/ei

DataSet

Metabolyzer

Data Extraction

-peak identification

-peak alignment

-peak deconvolution

Chemical Identification

-reference databases

-ion spectra

-grouping related ions

-compound id

Quantitation

Quality Control

Data Reduction

Preparation Analysis Informatics

LIMSNo Interpretation Interface

The sample prep involves stabilizing the sample, adding spiked-in calibrants, and creating multiple aliquots (some are frozen) for QC purposes. This is roboticized.

Sources of error in this step include: within-subject variation within-tissue variation contamination by cleaning solvents calibrant uncertainty evaporation of volatiles.

The result of this is a set of m/z ratios and timestamps for each ion, which can be viewed as a 2-D histogram in the m/z x time plane.

One now estimates the amount of each metabolite. This entails normalization, which also introduces error.

The caveats pointed out in Baggerley et al. (Proteomics, 2003) apply.

Baseline correction Alignment Estimating quantity of specific metabolites.

Confidential

GC Data

Let z be the vector of raw data, and let x be the estimates. Then the measurement equation is:

G(z) = x = µ + ε where µ is the vector of unknown true values and ε

is decomposable into separate components.

For metabolite i, the estimate Xi is:

gi(z) = lnΣ wij ∫∫sm(z) – c(m,t)dm dt.

The law of propagation of error (this is essentially the delta method) says that the variance in X is about

Σni=1 (∂g /∂ zi)2 Var[zi] +

Σi≠k 2 (∂g/∂zi)(∂g/∂zk) Cov[zi, zk]

The weights depend upon the values of the spiked in calibrants, so this gets complicated.

Cross-platform experiments are also crucial for medical use. This leads to key comparison designs. Here the same sample (or aliquots of a standard solution or sample) are sent to multiple labs. Each lab produces its spectrogram.

It is impossible to decide which lab is best, but one can estimate how to adjust for interlab differences.

The Mandel bundle-of-lines model is what we suggest for interlaboratory comparisons. This assumes:

Xik = αi + βi θk + εik

where Xik is the estimate at lab i for metabolite k, θk is the unknown true quantity of metabolite k, and

εik ~ N(0,σik2).

To solve the equations given values from the labs, one must impose constraints. A Bayesian can put priors on the laboratory coefficients and the error variance.

Metabolomics needs a multivariate version, with models for the rates at which compounds volatilize.

Confidential

Tissue Differences

Cancer Type - CNS cancer

Cancer Type - leukemia

Cancer Type - ovarian cancer

Cancer Type - breast cancer

Cancer Type - melanoma

Cancer Type - prostate cancer

Cancer Type - colon cancer

Cancer Type - non small cell lung cancer

Cancer Type - renal cancer

Statistical issues

Many missing values!!! Outliers Distribution of metabolites are not normally

distributed n<p Correlated metabolites

Statistical Issues

PCA or ICA Partial Least Squares Clustering Random Forest, SVM rSVD

Statistical issues

Dealing with missing values Replacing missing values by 0’s is not

necessarily a good idea. Not truly 0. Minimum, half-min, uniform(0, minimum) Random forest imputation Observing conditional distribution (Dr.

Young Truong at UNC)

Statistical Issues

Prediction and Classification Partial least squares Random Forest SVM Neural networks

Statistical Issues

Identifying relationships MDS Clustering rSVD (PowerMV from NISS)

ALS metabolomic data set

We had abundance data on 317 metabolites from 63 subjects. Of these, 32 were healthy, 22 had ALS but were not on medication, and 9 had ALS and were taking medication.

The goal was to classify the two ALS groups and the healthy group.

Here p>n. Also, some abundances were below detectability.

Using the Breiman-Cutler code for Random Forests, the out-of-bag error rate was 7.94%; 29 of the ALS patients and 29 of the healthy patients were correctly classified.

20 of the 317 metabolites were important in the classification, and three were dominant.

RF can detect outliers via proximity scores. There were four such.

ALS Metabolomic data set

Several support vector machine approaches were tried on this data:

Linear SVM Polynomial SVM Gaussian SVM L1 SVM (Bradley and Mangasarian, 1998) SCAD SVM (Fan and Li, 2000)

The SCAD SVM had the best loo error rate, 14.3%.

ALS Metabolomic data set

Robust SVD (Liu et al., 2003) is used to simultaneously cluster patients (rows) and metabolites (columns). Given the patient by metabolite matrix X, one writes

Xik = ri ck + εik

where ri and ck are row and column effects. Then one can sort the array by the effect magnitudes.

To do a rSVD use alternating L1 regression, without an intercept, to estimate the row and column effects. First fit the row effect as a function of the column effect, and then reverse. Robustness stems from not using OLS.

Doing similar work on the residuals gives the second singular value solution.

NCI data set

NCI 60 cell lines 9 cancer types: breast, CNS, colon,

melanoma, renal, leukemia, prostate, ovarian, lung

GC-LS Melanoma vs CNS (8 cell lines for

melanoma and 6 cell lines for CNS)

Variable Importance using RF

Component 1 versus 2

Useful websites

Deconvolution of peaks, software AMDIS (http://chemdata.nist.gov/massspc/amdis; NIST, Gaithersburg, USA)

Human Metabolome database (www.hmdb.ca) KEGG (www.genome.jp/kegg) http://www.niss.org/PowerMV/ Many, many others

Concluding Remarks

Many interesting statistical issues still need to be addressed. Measurement issues and interlaboratory

differences need to be properly addressed. Statistical issues in analyzing metabolomic data

still remain an interesting challenge. Metabolomics is an important part in

understanding systems biology.

1 Metabolomics a Promising ‘omics Science By Susan Simmons University of North Carolina Wilmington

Documents

University of Alberta “OMICs” Conference · High-performance Isotope Labeling Liquid Chromatography Mass Spectrometry for Metabolomics Applications. 2:45 - 3:15 David Wishart(Abstract

Integrated omics: tools, advances and future approaches · the application and instrumentation, metabolomics captures small molecule information in solid (i.e., solid-state NMR),

Metabolomics · Systems Biology and the rise of the “-omics” Omics technologies such as genomics and high-throughput DNA sequencing were introduced in parallel to the Human Genome

The crucial role of multiomic approach in cancer …...Biological omics including genomics, transcriptomics, proteomics, metabolomics and radiomics aims to systematically under stand

Metabolomics: The Next Generation of Omics

Metabolomics Highlights: The ... - Practical Dermatologyv2.practicaldermatology.com/pdfs/PD1019_CF_Metabolomics.pdf · 52 PRACTICAL DERMATOLOGY OCTOBER 2019 er? >> The study of “-omics”

OMICS Publishing Group I OMICS-Publishing-Group-Journals| Omics Group

Metabolism platforms - cpsa-metabolomics.comcpsa-metabolomics.com/2016/vendor_pirman.pdf · – Untargeted metabolomics, fluxomics, mutli-omics ... – Flux analysis, data visualization,

Machine Learning and Knowledge Extraction in …...“-omics” data, for example from genomics, proteomics, metabolomics, etc. [4] make traditional data analysis problematic and optimization

Weed Science Omics in Weed Science: A Perspective from ...€¦ · Omics in Weed Science: A Perspective from Genomics, Transcriptomics, and Metabolomics Approaches Amith S. Maroli1,

Metabolomics - Open Access Journals | Scientific ... · Metabolomics : Open Access ... Features in Liquid Chromatography/Mass Spectrometry Metabolomics Data. Metabolomics 2:110. doi

Metabolomics Highlights: The Importance of Circadian ... · 52 PRACTICAL DERMATOLOGY OCTOBER 2019 er? >> The study of “-omics” continues to impact the medical research and skincare

2018 North American AI-driven Next-generation Metabolomics ......analysis of untargeted metabolomics data, enriched with additional large-scale omics-driven data and raw, unstructured

From Statistical to Biological Interactions via Omics ... · Omics data 8 Genomics Epigenomics Transcriptomics Proteomics Metabolomics Phenomics ... Integrative network-based analysis

OWL metabolomics 2014... · OWL metabolomics

Mass Spectrometry based metabolomics - … 02-16...Mass Spectrometry based metabolomics Metabolomics- A realm of small molecules (

Metabolomics in Systems Biology: Integrating Multi-omics Data K.A.

Introduction to Metabolomics Thomas M. O’Connell, Ph.D. UNC Metabolomics Laboratory Definitions of Metabolomics Analytical Instrumentation –Nuclear Magnetic

Metabolomics - Systems biology approaches and …...mapping the results of high-throughput omics studies, including transcriptomics, proteomics and metabolomics data, onto interactive

11th Metabolomics & Systems Biology · 2017-11-17 · http//etaboloisonereneo/ conferenceserie.cosm Metabolomics Congress 2018 Dear Potential Sponsor/Exhibitor, OMICS International,