LC-MS€¦ · mzXML raw file informations 24 fichier HU_neg_091.mzXML scan # RetTime (sec)...

Preview:

Citation preview

29/05/2017 v 1.0.0

LC-MS

Pre-processing (xcms)

W4M Core Team

SECTION 1

Acquisition files upload and pre-processing with xcms: extraction, alignment and retention time drift correction.

2

LC-MS Data

What is provided by

the mass

spectrometers…

What we want for

data analysis

Web for documentation:

https://bioconductor.org/packages/release/bioc/html/xcms.html

Forums :

https://groups.google.com/forum/#!forum/xcms

http://metabolomics-forum.com

R based software,

Free

A lot of parameters to tune,

No graphical interface

Need to write a R script

Extraction with XCMS

CAMERA

Annotation

• Extraction• Extraction of ions in each sample

independantly.

• Grouping alignment•Each ion is aligned across all samples

• Retention time correction (optional)

•Fill peaks•Replace missing data with baseline value

•Statistics and visualisation (optionals)

•CAMERA

• For annotation of adducts, neutral loss and

isotopes

Extraction with XCMS

CAMERA

Annotation of

Adduct

Fragments

and isotopes

Extraction with XCMS

DATA

7

Data

• Raw data:

– mzxml, mzml, mzdata and netcdf

• sampleMetadata

8

Add some informations for further steps

HU_neg_011_b2 bio

HU_neg_014_b2 bio

Blank04 blank

Blank05 blank

… …

Data

• Raw data:

– mzxml, mzml, mzdata and netcdf

• sampleMetadata

9

samples class sampleType subset full injectionOrder batch age

HU_neg_011_b2 bio sample 0 1 44 ne1 19

HU_neg_014_b2 bio sample 0 1 57 ne2 22

Blank04 blank blank 0 1 16 ne2 NA

Blank05 blank blank 0 1 29 ne1 NA

… … … … … … … …

Add some informations for further steps

Two strategies

10

Two strategies

11

The "old" system

-> files are nested in folders for their groups within a zip file

+ The folders set the group of the files for xcms.group

+ Only one import and one step

- xcmsSet is limited to 6 CPUs

- The files aren't integrated into the history and can't be visualized (one day)

Two strategies

12

The "brand new" system

-> files are uploaded individually and processed in parallel

- The xcmsSet outputs have to be merged before using group

- A sampleMetadata file must be used to set the group

(but you need one for some further steps anyway)

+ One xcmsSet job is launch for each input file. It is highly parallelizable

+ The files are completely integrated in Galaxy and can be one day vizualized

+ A better transparency

Dataset Collection

• Dataset collection allow to group N datasets in 1 wrap / collection

• A Dataset collection depending of the tool will process nested datasets

– In one step

– In parallel

13

tool

xcmsSetxcmsSet

xcmsSet

Dataset Collection

14

Dataset Collection

15

Dataset Collection

16

1

XCMSSETDataset collection

17

Dataset Collection

• Dataset collection allow to group N datasets in 1 wrap / collection

• A Dataset collection depending of the tool will process nested datasets

– In one step

– In parallel

18

tool

xcmsSetxcmsSet

xcmsSet

Dataset Collection

19

Dataset Collection

20

Dataset Collection

21

1

XCMSSETRUN!

22

mzXML raw file

23

mzXML in a text editor

1 scan

mzXML raw file informations

24

fichier HU_neg_091.mzXML

scan # RetTime (sec) basePeakMz int TIC %TIC delta ppm delta dalton

724 374.013 187.006423950195 1.11E+07 2.66E+07 42%

725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473

726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290

727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137

728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198

729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137

730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046

731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000

732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000

733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229

734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229

735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458

736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137

737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046

738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229

739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092

740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076

741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122

RT range m/z median

8.166 187.006805419922

2.5

1.60.7

1.10.7 0.2 0.0 0.0

1.21.2

2.40.7 0.2

1.2 0.5 0.40.7

187.0060

187.0065

187.0070

187.0075

187.0080

723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742

m/z deviation

Real life example : m/z 187

Scan to scan

m/z deviation

mzXML raw file informations

25

fichier HU_neg_091.mzXML

scan # RetTime (sec) basePeakMz int TIC %TIC delta ppm delta dalton

724 374.013 187.006423950195 1.11E+07 2.66E+07 42%

725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473

726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290

727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137

728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198

729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137

730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046

731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000

732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000

733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229

734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229

735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458

736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137

737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046

738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229

739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092

740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076

741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122

RT range m/z median

8.166 187.006805419922

Real life example : m/z 187

0.00E+00

1.00E+07

2.00E+07

3.00E+07

4.00E+07

5.00E+07

6.00E+07

7.00E+07

8.00E+07

9.00E+07

373 374 375 376 377 378 379 380 381 382 383

peak width = 8.2

xcms Extraction centwave algorithm

The algorithm aims detecting « Mass

traces » or « region of interest » (ROI)

which are defined as regions with less

than a defined deviation of m/z in

consecutive scans.

This deviation must be lower than the

value of the parameter « ppm »

The value (unit ppm) has to be set

according to mass spectrometer

accuracy.

ROI mass intensities are then used

to define the chomatographic peak

with continuous wavelet transform

algorithm.

peakwidth (min, max) has to be set

for this step.

20,50 for HPLC

5,12 for UPLC

Tautenhahn R. BMC Bioinformtics 2008

xcms Extraction algorithms

• MatchedFilter is dedicated to centroid or profile low resolution MS data

• Centwave is dedicated to centroid high resolution MS data

27

xcms Extraction centwave algorithm

Centwave chromatographic peaks detection.

Tautenhahn R., BioInformatics, 2008

Centwave chromatographic peaks detection.

Tautenhahn R., BioInformatics, 2008

xcms Extraction centwave algorithm

CAMERA

Annotation of

Adduct

Fragments

and isotopes

centwave basic parameters

CAMERA

Annotation of

Adduct

Fragments

and isotopes

centwave noise and coeluted peaks

xcms centwave parameters

xcms forum : How to choose peakwidth ?

"The main purpose of the peakwidth parameter is to roughly estimate the peak width range, this

parameter is not a threshold. The wavelets used for peak detection are calculated from this

parameter. If you use HPLC and your peaks are normally 20 - 60 s wide (base peak with), just go

with that, i.e. peakwidth=c(20,60) centWave will still detect peaks that are 15s or 80 s wide!

Important: Do not choose the minimum peak width too small, it will not increase sensitivity, but

cause peaks to be split."

Using peakwidth = c(20,60) the

peak will be split in three

peaks, each detected as a

~10s wide separate peak

(since they are separated by a

local minimum) :

using peakwidth = c(20,120)

will keep the peak intact :

Example: peak width ~ 45 s

CAMERA

Annotation of

Adduct

Fragments

and isotopes

xcms extraction output

CAMERA

Annotation of

Adduct

Fragments

and isotopes

xcms extraction output

When a zip file of is used

a sampleMetadata.tsv is

created at this step. It

must contains all

informations needed for

further analyses: batch

correction and statistical

analyses.

This file must be

downloaded in order to

add all these

informations and then

uploaded.

CAMERA

Annotation of

Adduct

Fragments

and isotopes

xcms extraction output

extraction parameters summary

xcms

steps

xcms

parameters

related to description examples

Extraction

(xcmsSet)

ppm m/z fluctuation of m/z value (ppm) from scan to scan.

Depends on the mass spectrometer resolution

5…

peakwidth retention

time

range of chromatographic peak width (second) UPLC 5,20

HPLC 10,40

mzdiff m/z and

retention

time

Minimum difference of mz for peaks with overlapping

retention time (coeluting peak). Must be negative to

allow overlap.

-0.001 or

0.05

Prefilter Intensity A peak must be present in n scans with an intensity

greater than k.

n=3,k=1000

snthresh Intensity Ratio signal/noise threshold 3…

noise Intensity Each centroid must be greater than "noise" value.

xcmsSet

37

« Grouping » step

Group ions

by m/z

Group by

retention

time

Independant

peaklists

Resulting

matrix

pool1B1 pool1B2 pool1B3

mz rt int mz rt int mz rt int

196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325

158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969

342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581

267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448

mz rt int mz rt int mz rt int

196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325

158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969

342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581

267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448

mz rt int mz rt int mz rt int

196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325

158.1180 67.4 71736 158.1173 67.4 82969

342.0308 21.3 2581

342.0308 67.6 202268 342.0310 69.0 74594

267.0581 65.5 282039 267.0581 65.5 260877

283.0318 65.2 424631 283.0320 65.3 357448

mz rt pool1B1 pool1B2 pool1B3

196.0905 66.6 7810936 11733921 7933325

158.1176 67.4 71736 82969

342.0308 21.3 2581

342.0309 68.3 202268 74594

267.0581 65.5 282039 260877

283.0319 65.2 424631 357448

xcms alignment group

• First step, a binning of mass domain is performed. The size of the bin is defined by mzwid

parameter.

• Then for each mz bin, all ions of all samples are taken into account for all retention times.

• Kernel density estimator method is used to detect region of retention time with high density

of ions.

mzwid

xcms alignment group

• A gaussian model group together peaks with simillar retention time.

• The inclusivness of ions in a group is defined by the the standard deviation of the gaussian

model (bandwith) corresponding to of the bw parameter xcms.

• This parameter can be interpreted as a retention time window.

• Vertical dash lines indicates that the feature is valid and will be retain in the data Matrix

• To be valid, the number of peaks in a group must be greater than the a percentage of the

total number of samples. This threshold is defined by the minfrac parameter.

bw = 30 sec

mzwid

Problem

xcms alignment group

bw = 30 sec

mzwid

bw = 10 sec

• Decreasing bw allows to separate these 2 groups.

•The resulting m/z and retention time of the feature correspond to the median of m/z and RT of

all ions grouped together as a single feature.

Problem

Solved

42

4 samples in

each group

Minfrac = 0.5

m/z

RT

m/z

RT

Minfrac = minimum

sample detected in at

least one class to be

considered as a group

Minfrac parameter for group

4 samples in

each group

Minfrac = 0.5

m/z

RT

Minfrac parameter for group

group interface

CAMERA

Annotation of

Adduct

Fragments

and isotopes

xcms group output

CAMERA

Annotation

of

Adduct

Fragments

and isotopes

mzwid define the

intervals of m/z

bw define the

width of the

gaussian curve

xcms group output

Two distinct m/z merge as

one group. Mzwid and bw

too large

xcms group output

Two distinct m/z are

separated by decreasing

bandwith value.

grouping parameters summary

xcms

steps

xcms

parameters

related to definition examples

Alignment

(group)

mzwid m/z Size of mz slices (bins). Range of m/z to be included in

a group. Depends on mass spectrometer accuracy.

bw retention

time

standart deviation of the gaussian metapeak that group

together peaks

minfrac samples A group to be valid must be found in minfrac*total

number of samples in each subfolder of datafiles.

minfrac=0.5 correspond to 50%.

n=10,

minfrac=0.5

found in at

least 5

max number of

ions

Maximum number of groups detected in a single mz

slices.

10 or 50

xcms workflow retcor

CAMERA

Annotation of

Adduct

Fragments

and isotopes

xcms retcor output

CAMERA

Annotation

of

Adduct

Fragments

and isotopes

retcor improving retention time

must be followed by a second

group step.

xcms retcor output

CAMERA

Annotation

of

Adduct

Fragments

and isotopes

Modification of the degre of

smoothing

Span = 0.8

Parameters for retcor

52

4 samples in

each group

Missing = 1

m/z

RT

Extra = 1

m/z

RT

xcms retcor obiwarp

53

retcor parameters summary

xcms

steps

xcms

parameters

related to description examples

Retention

time

correction

(retcor)

smooth

method

retention

time

Regression model to model time deviation among

samples (linear or loess)

linear or

loess

span degree of smoothing of the loess model. 0.2 to 1

extra samples number of "extra" peaks use to define reference peaks

(or well behaved peaks) for modeling time deviation.

Number of Peaks > number of samples.

default=1

missing samples number of samples without reference peaks. If blank

samples are used, missing = number of blanks.

number of

blank

samples

ployType retention

time

Define the graphical visualistion of the effect of the

model on retention time correction.

deviation

Second grouping

CAMERA

Annotation of

Adduct

Fragments

and isotopes

As retcor improved retention drift

among samples a new grouping is

mandatory to take advantage of this

correction.

bw parameter can thus be set to a

smaller value than in the first group

step.

xcms fillPeaks

CAMERA

Annotation

of

Adduct

Fragments

and isotopes

Filling method:

«chrom» for LCMS

«MSW» for direct

injection.

29/05/2017 v 1.0.0

MS data processing

Report creation and Annotations

Yann GUITTON

Recommended