Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Stratified subsampling for effective
removal of batch effects
in metabolomics
Application to
endocrine disruptors screening
Julien Boccard
School of Pharmaceutical Sciences
University of Geneva, University of Lausanne
Endocrine Disruption
Endocrine disruption is related to many pathologies such as
infertility, diabetes, obesity and cancer (breast, prostate, endometrial,
ovary, cervical, testis, bladder, renal, thyroid or osteosarcoma, etc.)
Need for efficient monitoring to provide an
opportunity to diagnose exposure/disease at
early stages
The screening of potential Endocrine Disrupting Chemicals
is a major concern for regulatory agencies
U.S. Environmental Protection Agency (EPA) and Organization for Economic
Co-operation and Development (OECD) fund R&D programs
?
Steroid Profiling in Adrenal Cell Model
Develop of an efficient and robust analytical protocol for
monitoring steroid metabolites in H295R cell culture supernatant
Sample preparation: Protein precipitation, Solid-Phase Extraction
UHPLC analysis coupled to QTOF high resolution Mass Spectrometry
H295R Steroidogenesis Assay, OECD 2011
H295R cell lines (from human adrenocortical carcinoma cells)
OECD model to study steroidogenesis perturbations
H295R cell lines expresses genes encoding
most of the key enzymes of steroidogenesis
BUT
Test designed to assess variations of
testosterone and estradiol
due to chemical exposure
Untargeted MS Acquisition
Organic acids
?
Lipids
Acylcarnitines Nucleosides &
Derivatives
UHPLC-QTOF/MSE → full m/z range acquisition (100-1’000)
About 10’000 detected features …
Focus on Steroid Metabolites
Steroids
Database ID - Pathways
Litterature Scientific Knowledge
Web ressources
About 250 reference steroids …
Experimental Dataset
Exposure to 7 different conditions (6 toxicants – 1 control)
Acetyltributylcitrate (ACT), forskolin (FOR), linuron (LIN), octocrylene (OCT),
octylmethoxycinnamate (OMC), torcetrapib (TOR), dimethlysulfoxide (DMSO)
Non-cytotoxic concentrations
Two or three replicates
Three biological batches to estimate repeatability
49 samples with >100 annotated steroid metabolites
High variability between batches
Very strong batch effect
Metabolic alterations due to exposure
are masked
Principal Component Analysis
-15
-10
-5
0
5
10
15
20
-15 -10 -5 0 5 10 15 20
PCA score
t1
(42.8%)
t2
(27.1%)
Batch 3 Batch 1
Batch 2
Batch Effects Removal
Between group
variation
Total variation
Within group
variation
ASCA
ANOVA-PCA
ANOVA-PLS
ANOVA-TP
AComDim
AMOPLS
How to account for the study design in a multivariate context ?
Associate ANOVA decomposition with projection methods
Explicitly consider the batch as an experimental factor
Quality Controls (QCs) samples for batch correction
X = Xμ + Xα + Xβ + Xαβ + XRes
ANOVA Multiblock OPLS workflow
Experimental matrix
(n x k) X
ANOVA
decomposition
(n x k) + + + XRes X A X B X AB
XRes X A+XRes X B+XRes X AB+XRes
X = TpαPpαT + TpβPpβ
T + TpαβPpαβT + ToPo
T + E
Y = TpαQpαT + TpβQpβ
T + TpαβQpαβT + F
Joint analysis of
the submatrices
Prediction of level barycentres based on experimental submatrices
multiblock OPLS
Y
Boccard et al., Analytica Chimica Acta (2016), 920, 18-28.
Unbalanced Designs
What to do with groups (factor levels) of unequal sizes
General linear model approach offers an unbiased decomposition
(Thiel et al., 2017)
BUT submatrices are still non-orthogonal
Resampling using the smallest size of exchangeable units
(lowest number of observations associated with a level or combination of levels)
Balanced groups are mandatory for variance decomposition
Percentage of explained variation (Sum of square)
Orthogonal (uncorrelated additive) submatrices
?X = XMean + XExposure + XBatch + XInteraction + XRes
Stratified Subsampling
Batch 1
Batch 2
Batch 3
n=8 n=7 n=7 n=6 n=7 n=7 n=7
n=18
n=16
n=15
Exposure Factor (7 levels)
Batch Factor
(3 levels)
Stratified subsampling
(103 subsets)
Batch 1
Batch 2
Batch 3
OCT OMC ATC LIN TOR FOR DMSO
n=6 n=6 n=6 n=6 n=6 n=6 n=6
n=14
n=14
n=14
OCT OMC ATC LIN TOR FOR DMSO
0
10
20
30
40
50
60
70
80
Counts
Classes
0
10
20
30
40
50
60
70
80
Counts
Classes
0
10
20
30
40
50
60
70
80
Counts
Classes
0
10
20
30
40
50
60
70
80
Counts
Classes
Model Population
Scores
Loadings
0
10
20
30
40
50
60
70
80
Counts
Classes
0
10
20
30
40
50
60
70
80
Counts
Classes
Exposure 16.7%
Batch 61%
Quantitative evaluation
of the experimental factors
Interaction 15.8%
Residuals 6.5%
AMOPLS Effects Interpretation - Batch
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Batch - Scores tp1 vs. tp2
-4
-3
-2
-1
0
1
2
3
4
5
6
7
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
pp4 vs. pp6
-8
-6
-4
-2
0
2
4
6
8
10
-6 -4 -2 0 2 4 6 8
Batch - Loadings
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3
Interaction - Scores tp4 vs. tp6
tp1
tp2
tp4
tp6
pp1
pp2
pp4
pp6
Batch 3 Batch 1
Batch 2
Batch Main Effect
Batch × Exposure
Interaction
FOR
TOR
OCT
ACT
DMSO
LIN
OMC Clear groupings according
to batches
Major source of variations
SCORES
Massive overall differences
Culture medium variability
LOADINGS
AMOPLS Effects Interpretation - Exposure
-6
-4
-2
0
2
4
6
8
10
-7 -6 -5 -4 -3 -2 -1 0 1 2 3
pp3 vs. pp5
pp3
pp5
-6
-4
-2
0
2
4
6
8
10
12
-8 -6 -4 -2 0 2 4 6
pp7 vs. pp10
pp10
pp7
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2
Toxicants - Scores tp3 vs. tp5
tp5
tp3
FOR
DMSO
TOR
ACT
OCT
OMC
LIN
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
-0.15 -0.1 -0.05 0 0.05 0.1 0.15
Toxicants - Scores tp7 vs. tp10
tp10
tp7
ACT
OCT
LIN
OMC
DMSO
TOR
FOR
Exposure Main Effect
Clusters of samples related
to chemical exposure
Homogeneous groups
SCORES
Higher or lower abundances
according to exposure
Specific patterns
LOADINGS
Xenobiotics Mapping
Hierarchical Cluster Analysis • Euclidean distances based on AMOPLS scores
• Ward aggregation method
DMSO
1 2 3 4
ATCATCATCATCATCATCATCOCTOCTOCTOCTOCTOCTOCTOCTOMCOMCOMCLINLINLINOMCOMCOMCOMCLINLINLINDMSO DMSO DMSO DMSO DMSO DMSO DMSO TORTORTORTORTORTORTORFORFORFORFORFORFORFOR
FOR
TOR
LIN &
OMC
OCT
ATC
tp3 tp5 tp7 tp10
1 2 3 4 5 6
Acetyl tributylcitrate_1Acetyl tributylcitrate_3Acetyl tributylcitrate_1Acetyl tributylcitrate_2Acetyl tributylcitrate_2Acetyl tributylcitrate_3Acetyl tributylcitrate_2Acetyl tributylcitrate_3Acetyl tributylcitrate_1Octocrylene_1Octocrylene_2Octocrylene_3Octocrylene_3Octocrylene_2Octocrylene_1Octocrylene_3Octocrylene_2Octocrylene_1Octyl Methoxycinnamate_2Octyl Methoxycinnamate_3Octyl Methoxycinnamate_1Octyl Methoxycinnamate_1Octyl Methoxycinnamate_3Octyl Methoxycinnamate_1Octyl Methoxycinnamate_3Octyl Methoxycinnamate_2Octyl Methoxycinnamate_2Linuron_2Linuron_1Linuron_3Linuron_3Linuron_2Linuron_3Linuron_1Linuron_1Linuron_2DMSO_2DMSO_1DMSO_1DMSO_3DMSO_3DMSO_3DMSO_2DMSO_1DMSO_2Torcetrapib_1Torcetrapib_2Torcetrapib_3Torcetrapib_3Torcetrapib_2Torcetrapib_3Torcetrapib_1Torcetrapib_2Torcetrapib_1Forskolin_1Forskolin_2Forskolin_3Forskolin_1Forskolin_3Forskolin_2Forskolin_3Forskolin_1Forskolin_2
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Positive score
Negative score
Similar anti-androgenic signatures
Known steroidogenesis inducer
Control
Induction of aldosterone and cortisol
Anti-androgenic and anti-estrogenic
Increased corticosteroid production
0.00
0.50
1.00
1.50
2.00
2.50
AMOPLS Effect-specific VIP value
𝑉𝐼𝑃𝑗 = 𝑝 𝑆𝑆𝑎 𝑤𝑎𝑗 𝑤𝑎 2
𝐴
𝑎=1
𝑆𝑆𝑎
𝐴
𝑎=1
Selection of the most relevant steroids
Highlight altered enzymes for
mechanistic interpretation
Focus further analytical developments
for absolute quantification
Exposure Main Effect VIP ?
Conclusions
Applicable to any ANOVA-based strategies (ASCA, ANOVA-PCA, ...)
Quantitative evaluation of the experimental factors
Mapping of xenobiotics according to
their steroidomic signatures
Removing batch effect using specific components
Focus on the most relevant steroid
metabolites and enzymes
Stratified subsampling allowed proper variance decomposition and
modeling of the different sources of variation using AMOPLS
Prof. Serge Rudaz Dr. Fabienne Jeanneret Dr. David Tonoli
Acknowledgements
Prof. Alex Odermatt Dr. Petra Strajhar