Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics
University of North Carolina at Charlotte
Introduction to Preprocessing of Untargeted Metabolomics Data
Outline• Raw untargeted LC/MS and GC/MS metabolomics data
- Profile and centroid data
- Mass vs. retention time map
- TIC
- EIC
• Principles of LC/MS and GC/MS data preprocessing
• Feature identification
- Identification of known compounds
- Identification of unknown compounds
2
3
Raw Untargeted LC/MS and GC/MS Metabolomics data
4
list of scans in raw files• MS scans in blue• MS/MS scans in red• # sequential number• @ retention time• MS level• type of spectrum
• p = profile• c = centroid• t = thresholded
• polarity of ionization• + = positive• - = negative• ? = unknown
List of mass spectra
One mass spectrum
5
One mass spectrum
6
One mass spectrum
7
Zoom in one mass spectrum
8
profile mode
Mass spectra in centroid mode
9
Mass spectra in centroid mode
10
Spectrum in centroid mode• Data files are much smaller than files in profile mode.
• We will use the centroid data for practicing data pre-processing using XCMS and MZmine 2.
11
LC-MS raw data in 3D
12
Raw data in 3D
13
3D to 2D• Direct processing of the 3D data is NOT trivial
• Instead, we examine 2D
- Mass vs. retention time
- Total ion current vs. retention time: TIC
- Ion current vs. retention time for a particular mass: EIC (Extracted Ion Chromatogram)
14
Mass vs. retention time map
15
TIC
16
EIC
17
EIC
18
19
Principles of LC/MS and GC/MS Data Preprocessing
Data preprocessing workflow
20
GC-MS
LC-MSdetect masses from
mass spectra
detect masses frommass spectra
constructEICs
constructEICs
detectchromatographic peaks
detectchromatographic peaks
deconvolution
annotation
databasesearch
databasesearch
1 2 3 4 5 6alignment /
correspondence
alignment / correspondence
Construct EICs
21
Select one EIC
22
One EIC
23
Detect EIC peaks
24
• Use wavelet transform
• Implemented in XCMS as the centWave method
Detect EIC peaks
25
mexican hat wavelet
−20 −10 0 10 20
−0.4
0.0
0.2
0.4
0.6
0.8
t
ψψs,ττ((
t))
s = 1s = 2s = 8
68
1012
1:dim(wCoefs)[1]
Scal
e
+
++
2850 2900 2950 3000 3050 3100 3150
02
46
810
Seconds
Inte
nsity
* 10
3
ChromatogramGaussian Fit
Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.
Detected EIC peaks
26
27
LC/MS-specific Data Preprocessing
Find isotopes
28
Find isotopes
29
Find isotopes
30
31
Alignment
zoom in……
32
Alignment
33
Alignment
Peaks table after alignment
34
35
GC/MS-specific Data Preprocessing
GC-EI-MS
36
• Example: EI fragmentation of methanol
EI fragmentation
37
[CH3OH]•+ �! CH3O+ +H•
[CH3OH]•+ �! CH2O+ +H2
[CH3OH]•+ �! CH+3 + •OH
Deconvolution
38
39
40
41
ADAP-GC 2.0
ADAP-GC 2.0: Deconvolution of Coeluting Metabolites from GC/TOF-MS Data for Metabolomics Studies. Analytical chemistry 2012, 84 (15), 6619-29.
42
Feature identification
Feature identification• Apply statistics and machine learning to detect
discriminating peaks
• Identify discriminating peaks
43
Identification of known compounds• Screening search for compound ID based on LC-MS
data
- Searching monoisotopic mass and isotopic distribution against compound databases
• Library match for compound identification from both LC-MS/MS and GC-MS spectra
44
HMDB
45
HMDB
46
MS/MS or GC-MS spectra matching• Library match for compound identification from both
LC-MS/MS and GC-MS spectra
47
Identification of unknown compounds• MS-FINDER
• CSI:FingerID
• CFM-ID
• MetFrag
• MIDAS
• MAGMA
48
MetFrag
49
More on identification
50
More on identification
51
More on identification
52
More on identification
53
More on identification• Information we have for identification of compounds
based on MS/MS
- M+H
- Experimental isotopic identification
- MS/MS
54
55
More on identification
56
More on identification
57
More on identification
58
More on identification
59
More on identification
60
More on identification
61
More on identification
62
More on identification
63
More on identification
64
More on identification
65
More on identification
• Compare isotopic distributions
66
theoretical experimental
More on identification
67
More on identification
• CompareMS/MS
68
library
experimental
More on identification
69
Thank you!