Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
29/05/2017 v 1.0.0
LC-MS
Pre-processing (xcms)
W4M Core Team
SECTION 1
Acquisition files upload and pre-processing with xcms: extraction, alignment and retention time drift correction.
2
LC-MS Data
What is provided by
the mass
spectrometers…
What we want for
data analysis
Web for documentation:
https://bioconductor.org/packages/release/bioc/html/xcms.html
Forums :
https://groups.google.com/forum/#!forum/xcms
http://metabolomics-forum.com
R based software,
Free
A lot of parameters to tune,
No graphical interface
Need to write a R script
Extraction with XCMS
CAMERA
Annotation
• Extraction• Extraction of ions in each sample
independantly.
• Grouping alignment•Each ion is aligned across all samples
• Retention time correction (optional)
•Fill peaks•Replace missing data with baseline value
•Statistics and visualisation (optionals)
•CAMERA
• For annotation of adducts, neutral loss and
isotopes
Extraction with XCMS
CAMERA
Annotation of
Adduct
Fragments
and isotopes
Extraction with XCMS
DATA
7
Data
• Raw data:
– mzxml, mzml, mzdata and netcdf
• sampleMetadata
8
Add some informations for further steps
HU_neg_011_b2 bio
HU_neg_014_b2 bio
Blank04 blank
Blank05 blank
… …
Data
• Raw data:
– mzxml, mzml, mzdata and netcdf
• sampleMetadata
9
samples class sampleType subset full injectionOrder batch age
HU_neg_011_b2 bio sample 0 1 44 ne1 19
HU_neg_014_b2 bio sample 0 1 57 ne2 22
Blank04 blank blank 0 1 16 ne2 NA
Blank05 blank blank 0 1 29 ne1 NA
… … … … … … … …
Add some informations for further steps
Two strategies
10
Two strategies
11
The "old" system
-> files are nested in folders for their groups within a zip file
+ The folders set the group of the files for xcms.group
+ Only one import and one step
- xcmsSet is limited to 6 CPUs
- The files aren't integrated into the history and can't be visualized (one day)
Two strategies
12
The "brand new" system
-> files are uploaded individually and processed in parallel
- The xcmsSet outputs have to be merged before using group
- A sampleMetadata file must be used to set the group
(but you need one for some further steps anyway)
+ One xcmsSet job is launch for each input file. It is highly parallelizable
+ The files are completely integrated in Galaxy and can be one day vizualized
+ A better transparency
Dataset Collection
• Dataset collection allow to group N datasets in 1 wrap / collection
• A Dataset collection depending of the tool will process nested datasets
– In one step
– In parallel
13
tool
xcmsSetxcmsSet
xcmsSet
Dataset Collection
14
Dataset Collection
15
Dataset Collection
16
1
XCMSSETDataset collection
17
Dataset Collection
• Dataset collection allow to group N datasets in 1 wrap / collection
• A Dataset collection depending of the tool will process nested datasets
– In one step
– In parallel
18
tool
xcmsSetxcmsSet
xcmsSet
Dataset Collection
19
Dataset Collection
20
Dataset Collection
21
1
XCMSSETRUN!
22
mzXML raw file
23
mzXML in a text editor
1 scan
mzXML raw file informations
24
fichier HU_neg_091.mzXML
scan # RetTime (sec) basePeakMz int TIC %TIC delta ppm delta dalton
724 374.013 187.006423950195 1.11E+07 2.66E+07 42%
725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473
726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290
727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137
728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198
729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137
730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046
731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000
732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000
733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229
734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229
735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458
736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137
737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046
738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229
739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092
740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076
741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122
RT range m/z median
8.166 187.006805419922
2.5
1.60.7
1.10.7 0.2 0.0 0.0
1.21.2
2.40.7 0.2
1.2 0.5 0.40.7
187.0060
187.0065
187.0070
187.0075
187.0080
723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742
m/z deviation
Real life example : m/z 187
Scan to scan
m/z deviation
mzXML raw file informations
25
fichier HU_neg_091.mzXML
scan # RetTime (sec) basePeakMz int TIC %TIC delta ppm delta dalton
724 374.013 187.006423950195 1.11E+07 2.66E+07 42%
725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473
726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290
727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137
728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198
729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137
730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046
731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000
732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000
733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229
734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229
735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458
736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137
737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046
738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229
739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092
740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076
741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122
RT range m/z median
8.166 187.006805419922
Real life example : m/z 187
0.00E+00
1.00E+07
2.00E+07
3.00E+07
4.00E+07
5.00E+07
6.00E+07
7.00E+07
8.00E+07
9.00E+07
373 374 375 376 377 378 379 380 381 382 383
peak width = 8.2
xcms Extraction centwave algorithm
The algorithm aims detecting « Mass
traces » or « region of interest » (ROI)
which are defined as regions with less
than a defined deviation of m/z in
consecutive scans.
This deviation must be lower than the
value of the parameter « ppm »
The value (unit ppm) has to be set
according to mass spectrometer
accuracy.
ROI mass intensities are then used
to define the chomatographic peak
with continuous wavelet transform
algorithm.
peakwidth (min, max) has to be set
for this step.
20,50 for HPLC
5,12 for UPLC
Tautenhahn R. BMC Bioinformtics 2008
xcms Extraction algorithms
• MatchedFilter is dedicated to centroid or profile low resolution MS data
• Centwave is dedicated to centroid high resolution MS data
27
xcms Extraction centwave algorithm
Centwave chromatographic peaks detection.
Tautenhahn R., BioInformatics, 2008
Centwave chromatographic peaks detection.
Tautenhahn R., BioInformatics, 2008
xcms Extraction centwave algorithm
CAMERA
Annotation of
Adduct
Fragments
and isotopes
centwave basic parameters
CAMERA
Annotation of
Adduct
Fragments
and isotopes
centwave noise and coeluted peaks
xcms centwave parameters
xcms forum : How to choose peakwidth ?
"The main purpose of the peakwidth parameter is to roughly estimate the peak width range, this
parameter is not a threshold. The wavelets used for peak detection are calculated from this
parameter. If you use HPLC and your peaks are normally 20 - 60 s wide (base peak with), just go
with that, i.e. peakwidth=c(20,60) centWave will still detect peaks that are 15s or 80 s wide!
Important: Do not choose the minimum peak width too small, it will not increase sensitivity, but
cause peaks to be split."
Using peakwidth = c(20,60) the
peak will be split in three
peaks, each detected as a
~10s wide separate peak
(since they are separated by a
local minimum) :
using peakwidth = c(20,120)
will keep the peak intact :
Example: peak width ~ 45 s
CAMERA
Annotation of
Adduct
Fragments
and isotopes
xcms extraction output
CAMERA
Annotation of
Adduct
Fragments
and isotopes
xcms extraction output
When a zip file of is used
a sampleMetadata.tsv is
created at this step. It
must contains all
informations needed for
further analyses: batch
correction and statistical
analyses.
This file must be
downloaded in order to
add all these
informations and then
uploaded.
CAMERA
Annotation of
Adduct
Fragments
and isotopes
xcms extraction output
extraction parameters summary
xcms
steps
xcms
parameters
related to description examples
Extraction
(xcmsSet)
ppm m/z fluctuation of m/z value (ppm) from scan to scan.
Depends on the mass spectrometer resolution
5…
peakwidth retention
time
range of chromatographic peak width (second) UPLC 5,20
HPLC 10,40
mzdiff m/z and
retention
time
Minimum difference of mz for peaks with overlapping
retention time (coeluting peak). Must be negative to
allow overlap.
-0.001 or
0.05
Prefilter Intensity A peak must be present in n scans with an intensity
greater than k.
n=3,k=1000
snthresh Intensity Ratio signal/noise threshold 3…
noise Intensity Each centroid must be greater than "noise" value.
xcmsSet
37
« Grouping » step
Group ions
by m/z
Group by
retention
time
Independant
peaklists
Resulting
matrix
pool1B1 pool1B2 pool1B3
mz rt int mz rt int mz rt int
196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325
158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969
342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581
267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448
mz rt int mz rt int mz rt int
196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325
158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969
342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581
267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448
mz rt int mz rt int mz rt int
196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325
158.1180 67.4 71736 158.1173 67.4 82969
342.0308 21.3 2581
342.0308 67.6 202268 342.0310 69.0 74594
267.0581 65.5 282039 267.0581 65.5 260877
283.0318 65.2 424631 283.0320 65.3 357448
mz rt pool1B1 pool1B2 pool1B3
196.0905 66.6 7810936 11733921 7933325
158.1176 67.4 71736 82969
342.0308 21.3 2581
342.0309 68.3 202268 74594
267.0581 65.5 282039 260877
283.0319 65.2 424631 357448
xcms alignment group
• First step, a binning of mass domain is performed. The size of the bin is defined by mzwid
parameter.
• Then for each mz bin, all ions of all samples are taken into account for all retention times.
• Kernel density estimator method is used to detect region of retention time with high density
of ions.
mzwid
xcms alignment group
• A gaussian model group together peaks with simillar retention time.
• The inclusivness of ions in a group is defined by the the standard deviation of the gaussian
model (bandwith) corresponding to of the bw parameter xcms.
• This parameter can be interpreted as a retention time window.
• Vertical dash lines indicates that the feature is valid and will be retain in the data Matrix
• To be valid, the number of peaks in a group must be greater than the a percentage of the
total number of samples. This threshold is defined by the minfrac parameter.
bw = 30 sec
mzwid
Problem
xcms alignment group
bw = 30 sec
mzwid
bw = 10 sec
• Decreasing bw allows to separate these 2 groups.
•The resulting m/z and retention time of the feature correspond to the median of m/z and RT of
all ions grouped together as a single feature.
Problem
Solved
42
4 samples in
each group
Minfrac = 0.5
m/z
RT
m/z
RT
Minfrac = minimum
sample detected in at
least one class to be
considered as a group
Minfrac parameter for group
4 samples in
each group
Minfrac = 0.5
m/z
RT
Minfrac parameter for group
group interface
CAMERA
Annotation of
Adduct
Fragments
and isotopes
xcms group output
CAMERA
Annotation
of
Adduct
Fragments
and isotopes
mzwid define the
intervals of m/z
bw define the
width of the
gaussian curve
xcms group output
Two distinct m/z merge as
one group. Mzwid and bw
too large
xcms group output
Two distinct m/z are
separated by decreasing
bandwith value.
grouping parameters summary
xcms
steps
xcms
parameters
related to definition examples
Alignment
(group)
mzwid m/z Size of mz slices (bins). Range of m/z to be included in
a group. Depends on mass spectrometer accuracy.
bw retention
time
standart deviation of the gaussian metapeak that group
together peaks
minfrac samples A group to be valid must be found in minfrac*total
number of samples in each subfolder of datafiles.
minfrac=0.5 correspond to 50%.
n=10,
minfrac=0.5
found in at
least 5
max number of
ions
Maximum number of groups detected in a single mz
slices.
10 or 50
xcms workflow retcor
CAMERA
Annotation of
Adduct
Fragments
and isotopes
xcms retcor output
CAMERA
Annotation
of
Adduct
Fragments
and isotopes
retcor improving retention time
must be followed by a second
group step.
xcms retcor output
CAMERA
Annotation
of
Adduct
Fragments
and isotopes
Modification of the degre of
smoothing
Span = 0.8
Parameters for retcor
52
4 samples in
each group
Missing = 1
m/z
RT
Extra = 1
m/z
RT
xcms retcor obiwarp
53
retcor parameters summary
xcms
steps
xcms
parameters
related to description examples
Retention
time
correction
(retcor)
smooth
method
retention
time
Regression model to model time deviation among
samples (linear or loess)
linear or
loess
span degree of smoothing of the loess model. 0.2 to 1
extra samples number of "extra" peaks use to define reference peaks
(or well behaved peaks) for modeling time deviation.
Number of Peaks > number of samples.
default=1
missing samples number of samples without reference peaks. If blank
samples are used, missing = number of blanks.
number of
blank
samples
ployType retention
time
Define the graphical visualistion of the effect of the
model on retention time correction.
deviation
Second grouping
CAMERA
Annotation of
Adduct
Fragments
and isotopes
As retcor improved retention drift
among samples a new grouping is
mandatory to take advantage of this
correction.
bw parameter can thus be set to a
smaller value than in the first group
step.
xcms fillPeaks
CAMERA
Annotation
of
Adduct
Fragments
and isotopes
Filling method:
«chrom» for LCMS
«MSW» for direct
injection.
29/05/2017 v 1.0.0
MS data processing
Report creation and Annotations
Yann GUITTON