21
Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute Scientist 19 June 2019

Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Strategies for conducting metabolomics analyses

at scale

Clary B. Clish, Ph.D.Senior Director, MetabolomicsInstitute Scientist

19 June 2019

Page 2: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Metabolomics: the systematic analysis of metabolites in biological specimens

• carbohydrates

• amino acids

• nucleotides

• organic acids

• acylcarnitines

• lipids

(> × 104)

adapted from Gerszten & Wang Nature 2008

Proximal reporters

of disease:

• diabetes

• CKD

• CVD

• inflammation

Microbiome

Diet

Lifestyle/environment

Rationale for doing metabolomics:

Page 3: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

• the actual number of metabolites/small molecules that occur in biological samples is unknown

• metabolite concentrations range from a few molecules per cell to mM but the best analytical techniques have linear dynamic ranges of 4-5 orders of magnitude

• physical properties differ widely among metabolites

–polarity: range from very polar to very nonpolar

– chemical stability: labile to very stable

• multiple analytical methods are needed to obtain “full” coverage of the metabolome

Metabolomics is a significant analytical challenge

Page 4: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

LC-MS-based metabolomics platform (Broad Institute)

NH2-HILIC

RP/C18

Atlantis-HILIC

RP/C8

Free fatty acids, bileacids, lipid mediators

MeOH extracts

Amines &cationic metabolitesMeOH/ACN extracts

Sugars, organicacids, purines, pyrimidines, etcMeOH extracts

Aliquot

Biofluid/Homogenate

LipidsIPA extracts

• + mode HRAM MS• Exactive Plus, Q Exactive

Nontargeted MS

hundreds of confirmed

knowns

&

thousandsyet-to-be-

confirmed/unknowns

Targeted MS

SamplePreparation

ChromatographyMass

SpectrometryMetabolite

ProfilesBiologicalSamples

Atlantis-HILICTrp metabolites

Nontargeted MS

Nontargeted MS

Targeted MS

• + mode HRAM MS• Exactive Plus, Q Exactive

• - mode HRAM MS• Q Exactive

• MRM profiling• Agilent 4695 QQQ

• MRM profiling• 5500 QTRAP

10µL

30µL

10µL

30µL

Nontargeted MS

• - mode HRAM MS• Q Exactive Plus

polarity

C18Eicosanoids/LM

sensitiveanalyses of

select compounds

Page 5: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Hybrid analyses: metabolites of known identity & unknowns

niacinamide

kynurenicacid

taurine

leucine isoleucine

carnitine

Ion chromatograms (“targeted” peaks) Full scan dataset (nontargeted)

RT = 5.41 min

m/z = 190.0498

Page 6: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Mitochondrial disorders

• Markers of mitochondrial disease• Mitochondrial dysfunction

Shaham PNAS 2010; 107:1571-5Chen Cell Rep 2014; 7:27-34Gohil J Biol Chem 2013; 288:35387-95Bau Elife 2016; 5. pii: e10575Delaney PNAS 2017; 114:8402-8407

Type 2 diabetes

• Metabolic predictors of future T2D in FHS, MDC, MCDS, NHS, DPP

• Metabolic phenotyping of SLC16A11 haplotypes associated with T2D

• Intervention response in DPP• Influence of diet on metabolic profiles

and T2D in PREDIMED

Wang Nat Med 2011; 17:448-53Rhee J Clin Invest 2011; 121:1402–1411Wang J Clin Invest 2013;123:4309-4317SIGMA Consortium. Nature 2014; 506:97-101Magnusson Diabetes 2015; 64:3010-6Walford Diabetes 2016; 65:1424-33O’Sullivan J Clin Invest 2017; 127:4394–4402

Renal disease

• CKD progression• Prediction of CV mortality• Novel markers of uremia

Kalim S et al. J Am Heart Assoc 2013; 2(6)Rhee E et al. J Am Soc Nephrol 2010; 21:1041Rhee E et al. Am J Nephrol 2016; 43:366-74Tran MT et al. Nature 2016; 531:528-32

Cancer metabolism

• Metabolic dependencies of cancer cells

Birsoy Nature 2014; 508:108-12Israelsen Cell 2013;155:397-409Jain M Science 2012; 336:1040-4Wang Cell 2014;158:1309-23Davidson Cell Metab 2016; 23:517-28Kryukov Science 2016; 351:1214-8Spinelli Science 2017; 358:941-946Zou Nat Commun 2019; 10:1617Li Nat Med 2019; 25:850-860

Cancer

• Early indicators of pancreatic cancer• Risk factors for breast cancer• Dietary and hormonal determinants of

cancer• Predictors of prostate cancer

Cardiovascular disease

• Predictors of CHD in DPP• Predictors of CHD in WHI• Influence of diet on metabolic profiles

and CVD in PREDIMED

Mayers Nat Med 2014; 20:1193-8Danai Nature 2018; 558:600-604.

Guasch-Ferré Am J Clin Nutr 2016; 103:1408-16Ruiz-Canela Clin Chem 2016; 62:582-92Lewis GD J Am Coll Cardiol 2016; 67:174-89Wang Circulation 2017; 135:2028-2040Paynter Circulation 2018; 137:841-853Zhao Circulation 2019; 139:2003-2011 Microbiome & disease

• Microbiome in IBD• Gut microbiome & diabetes• Bile acid profiles associated with C. diff

Infection & immunity

• Metabolic signaling and metabolism in immune cells

• Influence of infection on metabolism and vise versa

Tannahill Nature 2013; 496:238-42Mascanfroni Nat Med 2015; 21:638-46Wang Cell 2015; 163:1413-27Matheson Cell Host Microbe 2015; 18:409-23Graham Nat Commun 2015; 6:7838Palsson-McDermott Cell Metab 2015; 21:65-80Rothhammer Nat Med 2016; 22:586-97Yang Nature 2017; 548:602-606.Werling Cell 2019; 177:315-325

Meelu Inflamm Bowel Dis 2014; 20:1139-46Kostic Cell Host Microbe 2015; 17:260-73Allegretti Aliment Pharmacol Ther 2016; 43:1142Fujisaka J Clin Invest 2016; 126:4430-4443Ni Sci Transl Med 2017; 9(416)Fujisaka Cell Rep 2018; 22:3072-3086Franzosa Nat Microbiol 2019; 4:293-305Lloyd-Price Nature 2019; 569:655-662

Page 7: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

1948

Original cohortN = 5209

Offspring studyN = 5124

1971 present

Gen 3 studyN=4200

12 yearFollow up for T2D

Exam 5 (1991-95):

3000 people underwent

OGTT

75g d-glucose in 300 mL

Plasma metabolic predictors of T2D inthe Framingham Heart Study

2002 present

present

Robert Gerszten, Thomas Wang

Page 8: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

• Nested case-control study: baseline samples from 189 future T2D cases and 189

matched controls

• Matching based on fasting glucose, age, sex, BMI, and hypertension status

• Targeted metabolite profiling of ~250 metabolites

Metabolites dysregulated 4-12 years before T2D diagnosis

Wang TJ et al. Nat Med 2011; 17:448-453

Odds ratio for future diabetes: Plasma isoleucine, phenylalanine, and tyrosine

Discovery (FHS) Validation (Malmö)

12 year follow-up 13 year follow-up

(n=378) (n=326)

1st quartile

2nd quartile

3rd quartile

4th quartile

P for trend

1.0 (referent)

2.08 (0.97-4.46)

2.59 (1.09-6.15)

3.93 (1.54-10.04)

0.006

1.0 (referent)

3.48 (1.68-7.23

2.82 (1.25-6.34)

5.99 (2.34-15.34)

0.0009

ca

ses/c

on

tro

ls

Acyl chain carbon number Acyl chain double bonds

ca

ses/c

on

tro

ls

Acyl chain carbon number Acyl chain double bonds

ca

ses/c

on

tro

ls

Acyl chain carbon number Acyl chain double bonds

Rhee EP et al. J Clin Invest 2011; 121:1402-11

Wang TJ et al. J Clin Invest 2013; 123:4309-17

Triglycerides

2-aminoadipate

BCAA/AA

Page 9: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Metabolites associated with insulin resistance (HOMA-IR) in Gen 3 Framingham Heart Study participants

• 1000 participants

• Nontargeted HILIC-posmethod (knowns + unknown peaks)

• ~5000 peaks were observed in >80% of individuals

• ~500 peaks associated with key metabolic traits

~200 peaks associated with hepatic fat (age and sex adjusted)

Metabolites associated with HOMA-IR

Unknowns associated with HOMA-IR

Robert GersztenJohn O’SullivanJordan Morningstar

Page 10: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Phenotype covariates n beta p value

LPR AGE1:sex 464 -0.197 2.28E-24

LPR AGE1:sex:bmi1 464 -0.175 4.81E-16

LPR AGE1:sex:smoke1:alc1 463 -0.201 6.22E-25

LPR

AGE1:sex:smoke1:alc1:HDL1:

log(tg1):gluc1:diab:HTN1 457 -0.186 1.49E-16

LPR

AGE1:sex:smoke1:alc1:HDL1:

log(tg1):gluc1:diab:HTN1:bmi1 457 -0.174 1.71E-13

Cmpd #5836 (m/z 202.1185) is associatedwith hepatic fat in FHS

~200 peaks associated with hepatic fat (age and sex adjusted)

Page 11: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Unknown Metabolite:

m/z 202.1185

Phenotype:

Liver Fat

(CT Scan)

P = 6.22E-24

2-Hydroxy-3-methylbutyric acid

4-Hydroxyisovaleric acid

3-Hydroxyisovaleric acid

3-Hydroxyvaleric acid

4-Hydroxyvaleric acid

2-Hydroxy-2methylbutyric acid

Diethyl carbonate

3-Hydroxy-2-methyl-[S-(R,R)]-

butanoic acid

2-Hydroxyvaleric acid

L-Threonine

L-Allothreonine

Hydroxyethyl glycine

4-Amino-3-hydroxybutyrate

L-Homoserine

D-Alanyl-D-alanine

Alanyl-Alanine

4-Acetamido-2-aminobutanoic acid

1-Methylhistidine

3-Methylhistidine

Ethyl lactate

2-Methyl-3-hydroxybutyric acid

3-Hydroxy-2-methyl-[R-(R,S)]-butanoic acid

Erythronilic acid

2-Ethylhydracrylic acid

Gene:

AGXT2

(alanine-glyoxylate

aminotransferase 2)

P = 3.79E-9

O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402

Page 12: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

AGXT2:A Multifunctional enzyme

Rodionov RN et al. Trends Pharmacol Sci 2014; 35:575-82

Theoretical m/z:

202.1185

DMGV: a-keto-dimethyl-d-(NG,NG-dimethylguanidynol) valeric acid

Page 13: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Plasma DMGV levels are elevated in biopsy-proven NASH and are modulated following weight loss surgery

Plasma DMGV:Biopsy-proven NASH cohort

Plasma DMGV:Roux-en-Y gastric bypass patients (n =39)

(n = 36) (n = 36)

O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402

Page 14: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

DMGV predicts incident T2D in the Malmo Diet and Cancer Study (MDC) and the Jackson Heart Study (JHS)

Model Model1 Model 2

Adjusted for age, sex Adjusted for age, sex, BMI, glucose

MDC JHS MDC JHS

OR Quartile 1 1.0 (referent) 1.0 (referent) 1.0 (referent) 1.0 (referent)

OR Quartile Q2 1.09 (0.58-2.04) 1.15 (0.6–2.1) 1.15 (0.61–2.19) 1.30 (0.8–2.2)

OR Quartile Q3 1.49 (0.79–2.81) 1.87 (1.09–3.2) 1.59 (0.81–3.13) 1.51 (0.9–2.5)

OR Quartile Q4 2.71 (1.37–5.38) 2.6 (1.54–4.39) 2.81 (1.34–5.87) 1.79 (1.0–3.04)

P value for trend 0.003 0.00002 0.004 0.026

MDC: 196 incident cases, 126 controls; mean follow-up time = 12.8 yearsJHS: 133 incident cases, 465 controls; mean follow-up time = 7.5 years

O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402

Page 15: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

• metabolomics data are acquired in batches

- e.g. ~1000 samples/LC column)

• ability to serially concatenate data from batches as they become available potentiates linear scalability

• strategies required for:

- QC within batches

- standardizing data to compensate for drift in instrument sensitivity within a batch

- scaling data between batches

- matching unknowns between batches: “alignment”

• small variations in measured masses and retention times between batches complicates matching unknowns

Addressing projects of scale

Page 16: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Reference mixtures analyzed before and after to assure system performance

Internal standard(s) added in first step of sample extraction

- monitored during analyses

- may be used to standardize data

Pooled study sample: analyzed every 20 study samples

- used to standardize data across datasets

Second pooled reference sample, analyzed every 20 study samples

- used to assess: overall reproducibility & impact of standardization procedures

- we typically use the pooled study sample

QC & data standardization

“PREFB” used to monitor coefficients of variation foreach metabolite during and across the run

up to ~1000

study samples

per LC column

. . . PREFA PREFB Study samples (20) PREFA PREFB Study samples (20) PREFA PREFB Study samples (20) PREFA PREFB

LC-MS

Sample

Queue

“PREFA” used to remove temporal drift within batches and standardize dataacross batches using nearest-neighbor normalization

Page 17: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Metabolomics workflow enabled by custom software tools

Cloud-based web server

Alignment appStandardization app

• standardizes data to internal standards and/or pooled reference samples

• data visualization• One batch at a time or multiple aligned

batches

• Aligns features between current batch and net alignment of previous batches

• Uses nonparametric regression based on unambiguously matched peaks to adjust m/z and RT

Page 18: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Risk of incident breast & colon cancer: samples from 8000 study participants using one nontargeted method

• 3800 samples

• 4 columnsNHS

• 2600 samples

• 3 columnsNHS2

• 1600 samples

• 2 columns

NHS/

HPFS

Column 1

•Knowns: 204

•Unknowns: 11055

Column 2

•Knowns: 205

•Unknowns: 11447

Column 3

•Knowns: 204

•Unknowns: 11412

Column 4

•Knowns: 204

•Unknowns: 11006

Column 1

•Knowns: 172

•Unknowns: 8958

Column 2

•Knowns: 172

•Unknowns: 10664

Column 3

•Knowns: 172

•Unknowns: 13163

Column 1

•Knowns: 199

•Unknowns: 10619

Column 2

•Knowns: 199

•Unknowns: 9364

Heather EliassenWalter WillettMeir StampferFran Grodstein

Page 19: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Co

effi

cien

t o

f V

aria

tio

n (

%)

Peak Area (log scale)

• 3800 samples

• 4 columnsNHS• 2600 samples

• 3 columnsNHS2• 1600 samples

• 2 columns

NHS/

HPFS

RAW

STANDARDIZED

RAW

STANDARDIZED

RAW

STANDARDIZED

CVs Raw Stand.

Knowns

(199)24.4% 3.8%

Unknowns

(4141) 56.8% 17.0%

CVs Raw Stand.

Knowns

(249)21.8% 4.0%

Unknowns

(4763) 46.9% 15.9%

CVs Raw Stand.

Knowns

(199)14.6% 3.9%

Unknowns

(6084) 39.8% 12.2%

Co

eff

icie

nt

of

Var

iati

on

(%

)

Page 20: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

Pooled reference samples: fully aligned NHS breast cancer, NHS2 breast cancer, & NHS/HPFS colon cancer

Peak Area

Coeffic

ien

t of V

ariation

(%)

Median CV CV < 10% CV < 20%

Knowns (212) 3.9% 187 (88.2%) 205 (96.7%)

Unknowns (2816) 12.2% 1201 (42.6%) 1816 (64.5%)

Page 21: Strategies for conducting metabolomics analyses at scale · Strategies for conducting metabolomics analyses at scale Clary B. Clish, Ph.D. Senior Director, Metabolomics Institute

• working with smaller cohorts, we have been able to identify metabolic profiles that predict incident disease

• effect sizes among metabolic predictors tend to be small (e.g. 10% difference in means) and complete analysis of the UK Biobank promises to provide unprecedented statistical power

• scalable metabolomics workflows capable of analyzing thousands of plasma samples are presently available

• the modular nature of LC-MS-based metabolomics platforms enables an a la carte approach to method selection

• nontargeted metabolite profiling methods that measure both metabolites of known identity and unknowns enable serendipity

Concluding remarks