View
5
Download
0
Category
Preview:
Citation preview
High Resolution GC-MS Application: Metabolomics
Vladimir Tolstikov, PhD
Eli Lilly and Company
Sample Harvest and Storage
Biological Metadata
Sample Extraction Extraction Metadata
Sample Preparation RI internal standards, Derivatization
Sample Analysis QC, randomization
Standard Operational Procedure
Raw Data
Chromatography Metadata Mass Spectrometry Metadata
Metabolite Peak Annotation
Data normalization, background subtraction, detection limit
Analytical Protocols
Processed Data Collection and Organization Statistical Analysis Pathway Analysis
Experiment Submission
A
Volatiles Alchohols Organic acids
Essential oils Amino acids Organic amines
Esters Catecholamines Nucleosides
Perfumes Fatty acids Nucleotides
Terpenes Phenolics Oligosaccharides
Carotenoids Prostanglandins Peptides
Flavanoids Steroids Co-factors
Perfumes Sugar phosphates Polar Lipids
LC/MS GC/MS
PEGASUS GC-HRT accurate mass TOF Gerstel ALEX/CIS MultiPurpose Autosampler
Triple TOF 5600 accurate mass Triple quad 5500
Lilly Metabolomics Platform
Lilly Metabolomics Platform Data Analysis and Visualization
• Statistical analysis: An array of commonly used statistical and machine learning methods :
• univariate -fold change analysis, t-tests, volcano plot, and one-way ANOVA, correlation analysis;
• multivariate - principal component analysis (PCA), partial least squares - discriminant analysis (PLS-DA) and PCA-DA;
• clustering - dendrogram, heatmap, K-means, and self organizing map (SOM));
• supervised classification - random forests and support vector machine (SVM).
• Functional enrichment analysis: The analysis is based on several libraries containing ~6300 groups of biologically meaningful metabolite sets collected primarily from human studies;
• Metabolic pathway analysis: Pathway analysis (including pathway enrichment analysis and pathway topology analysis) and visualization for Human metabolic pathways with a total collection of 1173 pathways;
• Pathway analysis : MetPA, Ingenuity, GeneGo
Human urine GC/MS profiling
Throughput Quality
12:30.00 16:40.00 20:50.00 25:00.00 29:10.00 33:20.00 37:30.00 41:40.00 45:50.00
0.0e0
2.0e7
4.0e7
6.0e7
8.0e7
Time (min:sec)AIC
12:30.00 16:40.00 20:50.00 25:00.00 29:10.00 33:20.00 37:30.00 41:40.00 45:50.00
0.0e0
2.0e5
4.0e5
6.0e5
8.0e5
1.0e6
1.2e6
Time (min:sec)AIC
Mouse CSF
Sample volume - 2uL
Methoxyamine, MSTFA 2% TMSCI
1 uL splitless, CIS C4 injector
Detector EI 70ev
>60% probability score
>3000 peaks deconvoluted
>1200 names assigned
~ 75 metabolites identified
Metabolomics study requirements for
GC/MS instruments
GC-HRT
1 Sensitivity √
2 Fast acquisition √
3 Robustness √
4 Reproducibility √
Unique features
1 Routine stable high resolution √
2 Routine stable high mass accuracy √
3 True peak deconvolution √
4 Elemental composition assignment √
High Resolution, High Mass Accuracy: YES or NO ID
High Resolution, High Mass Accuracy: YES or NO ID
High Resolution, High Mass Accuracy: YES or NO ID
Case study Pancreatic Cancer
• PDAC patients - 119 Group 1 • Healthy volunteers – 55 Group 2 • Benign cyst – 41 Group 2a • Chronic pancreatitis – 32 Group 3 • Other cancers – 19 Group 4 Unpaired samples. Blood plasma analysis.
GC/TOF/MS - 70 polar metabolites,
LC/MS/MS (MRM) – panel: Eicosanoids, LPA, SP1, SPA1, Bile acids, PC. 30 non-polar metabolites
Study performed in UC Davis Genome Center, Davis CA, USA
Cohort Study Design
PLS-DA Random Forest
Cross platform data integration: Metabolomics data obtained from current study: 95 metabolites Transcriptomics data was retrieved from Pancreatic Expression Database: 255 genes
Experimental Data
Prediction
Carbohydrate Metabolism, Energy production, Small Molecule Biochemistry
Experimental Data
Carbohydrate Metabolism, Energy production, Small Molecule Biochemistry
Prediction
Small molecule biomarkers
Current study
Univariate Classic ROC analysis for selected metabolite ratios
100 cross validation (CV) were performed and the results were averaged to generate the plot with threshold averaging.
Multivariate ROC analysis (PLS-DA)
The prediction model was composed of 15 features. 21 random samples from each group were allocated as hold-out data for validation.
Group 0 – PDAC patients; Group 1 - controls red circles - predicted scores for hold-out samples Numbers – samples classified to the wrong group
The average accuracy based on 100 cross validations is 0.907. The accuracy for hold out data prediction is 0.905(38/42).
Performance Measure: Area under ROC curve Permutation Times: 100
Multivariate ROC analysis (PLS-DA)
AUC, sensitivity, specificity, and accuracy were 0.965, 95.0%, 95.0%, and 90.0%, respectively, according to the training set data.
• Screening a panel of biomarkers might be effective by embracing the idea that pancreatic adenocarcinoma has vast genetic heterogeneity, meaning no single biomarker exists that is strongly correlated with its diagnosis across the population of people who develop the disease.
• Using a statistical model, it is possible to determine that many of so called weak biomarkers, having 95 percent specificity for the disease, on average, have only a 32 percent sensitivity.
• Increasing number of weak biomarkers it would be possible to achieve required 99 percent sensitivity.
• There is hope for developing a panel that would have greater than 99 percent accuracy.
American Association for Cancer Research, Press Release 2012
Acknowledgments
Prof. Shiro Urayama, MD, UC Davis, Department of Gastroenterology and
Hepatology, Davis, CA, USA Dr. Jean-Noel Billaud, PhD, INGENUITY SYSTEMS, Redwood City, CA, USA Dr. Wei Zou, PhD, Kindra Brooks, BS, UC Davis, Genome Center, Davis, CA, USA
Recommended