Upload
dgrapov
View
220
Download
0
Embed Size (px)
Citation preview
7/29/2019 Lecture 3 Case Studies
1/32
Metabolomic Data Analysis
Case Studies
Dmitry Grapov, PhD
Case
Stud
ies
7/29/2019 Lecture 3 Case Studies
2/32
Case Studies
1. Data Exploration and Analysis Planning
Lung Cancer
2. Multifactorial Design Mouse Cerebellum
3. Time Course
OGTT Metabolomics
7/29/2019 Lecture 3 Case Studies
3/32
Analysis Planning
DOD Lung Cancer Plasma (CARET)Summary
Analysis of plasma primary metabolites to identify circulating markers
related with lung cancer histology type.
Methods
Exploratory data analysis using principal components analysis (PCA)
Analysis of covariance (ANCOVA)
Orthogonal partial least squares discriminant analysis (OPLS-DA)
Hierarchical cluster analysis (HCA) and multidimensional scaling (MDS)
7/29/2019 Lecture 3 Case Studies
4/32
Lung Cancer: Exploratory Analysis
Purpose
Overview data variance structureMethods
Singular value decomposition (SVD) on autoscaled data
PC1 and 2 (14% variance
explained) display 2
clusters of points
Cluster structure could not be
explained by histology or any
other metadata
Cluster structure is best
explained by instrumental
acquisition date
Black - 110629 to 110701
Red - 110702 to 110705
7/29/2019 Lecture 3 Case Studies
5/32
Lung Cancer: Analysis Planning
Purpose Identify significant changes in metabolites while adjusting for the noted batch effect, gender and
smoking status covariates.Methods Shifted logarithm (natural) transformed data ANCOVA: batch + gender + smoking False Discovery Rate correction and estimation
PCA used to overview covariate
adjusted data structure
Cluster structure in the adjusted data suggests
that there is another unexplained covariate
OPLS-DA was used to evaluate covariate adjustments and
hypothesis testing strategies
Modeling histology (control in green) Modeling control/cancer and histology
7/29/2019 Lecture 3 Case Studies
6/32
Lung Cancer: ANCOVA
Summary
Optimal testing strategy was identified as : Using covariate adjusted data ( ~batch +gender +smoking) to test for differences between control and
cancer (adenocarcinoma, NSCLC and squamous)
OPLS-DA overview of optimized
modeling strategyIdentified 24 (8%) significantly changes species (3 post
FDR)
7/29/2019 Lecture 3 Case Studies
7/32
Lung Cancer: Correlation Analysis
PurposeIdentify relationships betweenknown and unknown metabolicfeatures.
Methods
Hierarchical cluster analysis(euclidean distances fromspearmans correlations,linked by wards method)
Summary
Top features could begrouped into 8 majorcorrelated clusters
Top changed unknown metabolites could
be linked to named species
223566 tryptophan 225405 1/ beta-alanine 274174 methionine, glucuronic acid 228377 tryptophan 362112 tryptophan
7/29/2019 Lecture 3 Case Studies
8/32
Lung Cancer
Conclusions
Metabolic data contained batch effects, which could be in part explained
by data acquisition date Univariate analyses were limited by the effects of outliers
Multivariate modeling was used to identify 64 features (21%) which best
explain differences in plasma metabolites from patients with or without
lung cancer
hydroxylamine, aspartic acid, and tryptophan displayed patterns of
change consistent with differences in patient cancer histology
Correlation analysis was used to link many significant changes in
unknowns to tryptophan
7/29/2019 Lecture 3 Case Studies
9/32
Multifactorial Design
Mouse Cerebellum MetabolomicsSummary
Analysis of mice carrying a gene mutation in ERCC8. Cockayne Syndrome B, rareautosomal recessive congenital disorder, which is related to premature aging.Mutant animals display altered glycolytic and mitochondrial metabolism which
is benefited by a high fat diet.
Study Design
2 genotypes (WT, CSB; n=20)
4 diets per genotype (SD, Resv, CR, HFD; n=5)
Analysis
principal components analysis (PCA)
two-way analysis of variance (ANOVA)
orthogonal partial least squares discriminant analysis (OPLS-DA)
network mapping
http://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndrome7/29/2019 Lecture 3 Case Studies
10/32
Mouse Cerebellum: PCA
Method
Conducted on autoscaled data
using SVD.
Findings
Identified 6 possible outliers all
of which are in the WT genotype
7/29/2019 Lecture 3 Case Studies
11/32
Mouse Cerebellum: Outliers
methods
Use PLS-DA to determine if
outlier samples hold when trying
to maximize the difference
between WT and CSB animals.
Findings
Noted outliers in WT should be
removed or analyzed separately
PCA
PLS-DA
7/29/2019 Lecture 3 Case Studies
12/32
Mouse Cerebellum: ANOVAMethods
shifted log transformed data
two-way ANOVA (genotype, diet)
Findings
Identification of significant changes in metabolites due to genotype,
diet (treatment) and interaction between genotype and diet
genotype effect treatment effect interaction effect
7/29/2019 Lecture 3 Case Studies
13/32
Mouse Cerebellum: Multivariate Modeling
Methods
autoscaled data
classification of sample genotype OSC-PLS-DA/OPLS-DA
OSC-PLS-DA/OPLS-DA Validation
7/29/2019 Lecture 3 Case Studies
14/32
Mouse Cerebellum: Multivariate Modeling
Methods
autoscaled data
classification of sample genotype and diet (OPLS-DA) evaluation of Y construction (separate and combined)
multiple Y single Y
7/29/2019 Lecture 3 Case Studies
15/32
Mouse Cerebellum: Multivariate Modeling
Methods
autoscaled data
classification of diet (treatment) effects independently in eachgenotype
WT CSB
7/29/2019 Lecture 3 Case Studies
16/32
Mouse Cerebellum: Network Analysis
Methods
generate biochemical and chemical similarity network
map statistical and OPLS-DA model results to network
Analyze
genotype network
Treatment networks in WT and CSB separately
7/29/2019 Lecture 3 Case Studies
17/32
Mouse Cerebellum: Genotype Network
7/29/2019 Lecture 3 Case Studies
18/32
Mouse Cerebellum: WT Treatment Network
7/29/2019 Lecture 3 Case Studies
19/32
Mouse Cerebellum: CSB Treatment Network
7/29/2019 Lecture 3 Case Studies
20/32
Mouse Cerebellum
Conclusions
Major differences between CSB and WT : elevation of 2-hydroxyglutaric acid in CSB
2-hydroxyglutaric aciduria is either autosomal recessive or autosomaldominant
perturbations in methionine and (potentially) single-carbon
metabolisms. Increase in the related species methionine, homoserine and serine anddecrease in adenosine-5'phosphate may point to decreases in s-adenosyl methionine (SAM-e) synthesis. Reduction in SAM-e could havedetrimental effects on single carbon metabolism and methylationreactions, which through a systemic reduction in choline would impactphospotidylcholine synthesis.
Independent of genotype, treatment effects can be classified on acontinuum of metabolic change from CR >HFD > Resv > SD.
Treatment-related changes in citrulline were modified based on genotype(strong genotype/treatment interaction).
Similar changes due to treatment in both genotypes (e.g. 1,5-anhydroglycitol) may be an outcome of diet composition and not
biology.
http://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduria7/29/2019 Lecture 3 Case Studies
21/32
Time Course
Oral Glucose Tolerance Test MetabolomicsSummary
Analysis of changes in plasma primary metabolites during an oral glucosetolerance test (OGTT) before and after a 14 week diet and exerciseintervention.
Study Design
Overweight women (12-15, obese sedentary, glucose 100 -128 mg/dL )
Pre and post intervention
Clinical panel: insulin, glucose, lipids
Primary metabolites at 0, 30, 60, 90, 120 minutes
Analysis
principal components analysis (PCA)
two-way analysis of variance (ANOVA)
orthogonal partial least squares discriminant analysis (OPLS-DA) network mapping
7/29/2019 Lecture 3 Case Studies
22/32
OGTT: Data Properties
Excursion
Baseline and Area
Under the Curve
(AUC)
7/29/2019 Lecture 3 Case Studies
23/32
Time Course: Options
Baseline adjusted vs AUC
Raw (top) vs Baseline
adjusted (bottom)
7/29/2019 Lecture 3 Case Studies
24/32
OGTT: Data Analysis
Identification of OGTT effects significant metabolomic excursions (one sample t-Test on AUC)
pre, post or both
intervention-adjusted PLS model
OGTT biochemical/chemical similarity network
Identification of treatment effects Univariate statics
Two-way ANOVA time and intervention
Mixed effects modeling (intervention as the main effect and individual subjects asrandom effects)
PLS-DA modeling and feature selection of changes in
Baseline (t =0)
AUC
Combined baseline and AUC
Analysis of correlations
7/29/2019 Lecture 3 Case Studies
25/32
OGTT: effects on primary metabolism
PCAPLS-DA
(intervention adjusted data
modeling time)
7/29/2019 Lecture 3 Case Studies
26/32
OGTT: effects network
7/29/2019 Lecture 3 Case Studies
27/32
OGTT: Treatment Effects
PLS-DA
7/29/2019 Lecture 3 Case Studies
28/32
OGTT: Treatment Effects
Learning from the samples scores position
7/29/2019 Lecture 3 Case Studies
29/32
OGTT: Treatment Effects
Feature Selection onLoadingsVariable Loadings
7/29/2019 Lecture 3 Case Studies
30/32
OGTT: Linking biology with our experiment
7/29/2019 Lecture 3 Case Studies
31/32
OGTT: Analysis of Correlations
7/29/2019 Lecture 3 Case Studies
32/32
Conclusion
Each data analysis is unique Which method should be used is
defined by how the data looks and the
goal of the analysis Different analysis techniques are used to
get independent perspectives of the data
Combination of similar evidence fromdifferent techniques is used to define the
robust explanation of the experiment