Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Validation and Portability of Unbiased, Label-free Proteomics
Joe Lucas1,2 , Will Thompson1, Laura Dubois1,
Keyur Patel1, Arthur Moseley1
Duke University School of Medicine, Durham, NC1
Quintiles, Morrisville, NC2
AMIA Disclosure
• The authors declare that there are no relevant financial relationships relating to the disease state or drug treatment discussed in this presentation
Biomarker Discovery, Verification, Applications in Multiple Labs
Number of Analytes
Number of Samples
10,000s
10s
100-1,000
100 -1,000
10s
1,000s
Biomarker Validation
Biomarker Discovery
Biomarker Verification
Open Platform LC/MS LC/MS/MS (MRM) Antibody-based Assays
Antibody-based Assays LC/MS/MS (MRM)
D. Butler Nature. 2008;453:840–2.
Basic Research – Small scale studies – Highly collaborative;
flexible deliverables – New technologies can
be tested – Hypothesis generation
is the key goal
Clinical Proteomics – Large clinical-based
studies – Deliverables & timelines
are well-defined – Requires very robust
technology that can be run across hundreds of samples & in multiple labs
– QC metrics must be tightly controlled
Towards the Development of a Comprehensive Experimental Design for Clinical Proteomics
• Insure a clear and realistic definition of the goals of the experiment
• Have available a comprehensive set of hardware and software tools
• Rigorously use quantitatively reproducible analytical methods
• Ideally, start with a mechanistic understanding of the biology of the disease – Insure you are looking in the right place at the right time
• Insure availability of a large cohort of well curated clinical samples – Initial selection of matched sub-cohort for discovery experiments – Extension from biomarker discovery to biomarker verification in “all comers” trials
• Insure professional use of statistical tools suitable for high dimensional data analysis
Exemplar Biomarker Discovery & Verification Project
Biomarkers to Predict Outcomes of Hepatitis C Patient Treatment in Serum of Treatment Naive Patients
Jeanette McCarthy, Keyur Patel, Joe Lucas and John McHutchison
Spontaneous clearance (~25%)
20% cirrhosis
3-5% cancer
Chronic infection
Eligible for Treatment
Responders Non-responders (>50%)
Hepatic Fibrosis Steatosis Insulin resistance Dyslipidemia
Increased risk of diabetes
Unknown consequences
Duke Hepatology Biorepository - 3,169 patients Discovery Cohort - small discovery experiment - well matched cohort from Biorepository - n = 55 patients - ‘omic LC/MS/MS
Biomarker Discovery Paradigm Challenge Hepatitis C Cohorts for Discovery and Verification
Insure Availability of a Large Cohort of Well Curated Clinical Samples
Open Platform LC/MS LC/MS/MS (MRM) LC/MS/MS (MRM)
Verification Cohort 2 - pediatric patients - “all-comers” trial - N = 50 patients
Verification Cohort 1 - well matched cohort from Biorepository - n = 41 patients
Verification / Validation Cohort 3 - “all-comers” trial (Australia) - N = 243 patients
• Protein and Peptide Separations – Four Waters Nanoscale UPLC – Two Waters Nanoscale UPLC/UPLC – One Acquity UPLC
• ‘Omic Qualitative and Quantitative Biomarker Discovery – Five high resolution, accurate mass, tandem mass spectrometers
• One hybrid quadrupole / time-of-flight systems – Waters Q-Tof Ultima
• Three hybrid quadrupole/ion-mobility/time-of-flight tandem mass spectrometer – Waters Synapt G1 HDMS – Waters Synapt G2 HDMS with ETD
• One hybrid LTQ / Orbitrap system – Thermo LTQ-Orbitrap
• Targeted Peptide and Protein Quantitation – One triple quadrupole tandem mass spectrometer
• Waters Xevo TQ-S
Waters UPLCs Four 1D systems Two 2D systems
Waters Synapt G1 HDMS Waters Synapt G2 HDMS (x2)
Waters Q-Tof Ultima
Thermo LTQ-Orbitrap (HHMI owned)
Waters Xevo Triple Quad-S
Duke ‘Omic Hardware Toolkit
Advion NanoMate
OFFGEL pI Fractionation
GELFREE MW Fractionation
Duke Proteomics Software Toolkit
Acquiring data is (relatively) easy, acquiring knowledge is hard • Qualitative Analyses
– Data Dependant Acquisition Database Searching – Matrix Science Mascot Software – runs across 40 processor equivalents
– Automated Processing Pipeline with Mascot Demon, Mascot Distiller, and Mascot Server – Dell Blade Cluster - 32 processor equivalents
– Data Independent Database Searching • Waters PLGS/IdentityE Software • Two ‘Home Brew’ Super-Computers - 1,000 X Cray-1 speed
– Data Visualization Software (data return to customers) • Proteome Software Scaffold
• Quantitative and Qualitative Analyses – Rosetta Elucidator Software
• Data processing, data statistics, data visualizations – Dell Server R900 (largest single server at Duke)
• 4 Quad Core Processors - 128 GB of RAM – Rosetta Oracle DB running on dedicated server
• Pathway Analyses – Ingenuity Pathway Analyses
• Data Storage – 72 Terabytes of NetApps Enterprise Quality Storage
– data mirrors for data security – ~50 TB ‘Cold’ Data Storage on DROBOs – Transitioning to Amazon Glacier
Dell R-900 Server
NetApps Data Storage
Dell Blade Server
Scaffold
Mascot
Elucidator
Oracle DB
Ingenuity Pathway Analysis
Label-Free Quantitation Strategy - flexibility to fit clinical study design
Image Translation
Retention Time Alignment
Intensity Normalization
m/z
Cohort 1
Cohort 3
Cohort 2
Cohort 4
Retention Time
… (n)
Retention Time
Retention Time
m/z
Retention Time
m/z
Master Image
Retention Time In
tens
ity
Individual Feature
9
“Discovery” Data Collection QC (High Quality Robust Instrumentation is only the Starting Point)
• Daily System Suitability Standard • Surrogate Standard Spike for Each Sample • QC Injection of Pooled Sample at Regular Interval • Real-time and post-acquisition QC metrics
Insure Professional Use of Statistical Tools Suitable for High Dimensional Data Analyses
Sparse Latent Factor Regression - Bayesian Factor Regression Modeling
35,000 Isotope Groups Predictive Factor “Metaproteins”
Factor Score “Expression Value”
Statistical Analysis: Joe Lucas, PhD,
Duke IGSP
• Regression - Leads directly to prediction • Sparsity – Most peptides are irrelevant for prediction • Latent Factors – let data determine important relationships • Resulting model for prediction:
• Initial Metaprotein Model - 650 Isotope Groups
Pastor Thomas Bayes
Bayesian Factor Regression Modeling Joe Lucas, “Metaprotein” Expression Analysis
Metaprotein Analysis -Groups peptides based on identification and/or coexpression -Casts a wide net -Discordant Peptides are avoided -Includes coexpressing peptides from additional proteins -Captures “Pathway” Expression -Regression for Prediction
Modified Verification Strategy using ‘omic Methodology
Performance of a 261 Peptide Model - Cross Validation to Second Cohort (41 patients)
Sparse Latent Factor Regression Bayesian Factor Regression Modeling
Using only proteomic parameters: 21 of 26 SVR patients were correctly predicted 12 of 15 non-responders were correctly predicted Predictor modeling will be improved by: - Increasing number of identified peptides - Improving data alignment across projects
Discovery and Initial Verification of SVR-Prediction Using Unbiased Data
Discovery Data, Build Model
Use Model to Predict SVR (Blinded)
Reproducibility of Metaprotein Biosignatures
• Build predictive model with first three cohorts • Predict NR / SVR in “Big Pharma” measured data
– different LC/MS/MS (LTQ-Orbi) system in different lab – Metaprotein model maintained consistent results
Discovery Cohort N = 55
Matched
Discovery Cohort Measured by
Big Pharma Lab
Verification Cohort 1
N = 41 Matched
Verification Cohort 2
N = 50 All-Comers
How to Create Method for Routine use in Multiple Labs
Number of Analytes
Number of Samples
10,000s
10s
100-1,000
100 -1,000
10s
1,000s
Biomarker Validation
Biomarker Discovery
Biomarker Verification
Open Platform LC/MS LC/MS/MS (MRM) Antibody-based Assays
Antibody-based Assays LC/MS/MS (MRM)
D. Butler Nature. 2008;453:840–2.
Transition from Biomarker Discovery to Biomarker Verification
Nature Methods “Method of the Year” nature methods | VOL.10 NO.1 | JANUARY 2013 | 19
“What I like about targeted proteomics is that you answer the question that you are interested in,” says Michael MacCoss of the University of Washington…. Rather than trying to detect all the proteins in a mixture, as in a discovery-based approach, a targeted approach “lets us build quantitative assays to specifically answer hypothesis driven questions,”
DPCF Deployment of Targeted Proteomics - biomarker verification in clinical cohort of 124 patients
Same method for three matrices 147 endogenous peptides 147 stable label peptides 6 non-endogenous control peptides 3 transitions/peptide 900 MRM transitions monitored - these numbers are not a limitation of the instrument
HCV Translation to SRM/MRM Platform
SRM Protein Thbg CO2 CO5 CO5 CO5 ITIH2 ITIH2 ITIH1 HRG HRG
Chariot
Peds C HCV - 41 Initial 55
P-value = 1.6x10-5
Use all three cohorts to choose targets and train the model
Chariot Study • 243 samples • 191 standard of care
• None of the others became SVR
• 117 SVR, 32 NR • Train model on three original
cohorts, 10 SRM peptides only
Work to Date • D. Cyr, Joseph E. Lucas, J. W. Thompson, K. Patel,
P. J. Clark, A. Thompson, H. L. Tillmann, J. G. McHutchison, M. A. Moseley, J. J. McCarthy (2011) “Characterization of serum proteins associated with IL28B genotype among patients with chronic Hepatitis C”, PLoS ONE 6(7): e21854. doi:10.1371/journal.pone.0021854
• K. Patel, Joseph E. Lucas, J. W. Thompson, L. G.
Dubois, H. Tillman, A. Thompson, D. Uzarski, R. M. Califf, M. A. Moseley, G. S. Ginsburg, J. G. McHutchison and J. J. McCarthy (2011) “High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C Patients”, Hepatology, 53(6), 1809-1818, doi:10.1002/hep.24284
Technical Work – Factor Model • Joseph E. Lucas, J. Will Thompson, Laura
G. Dubois, Jeanette McCarthy, Keyur Patel, Hans Tillman, Alex Thompson, John McHutchison and M. Arthur Moseley (2011), “Metaprotein Expression Modeling for Label-free Quantitative Proteomics”
• Ricardo Henao, J. Will Thompson, M. Arthur Moseley, Geoffrey S. Ginsburg, Lawrence Carin, and Joseph E. Lucas (2013) “Latent protein trees”, Annals of Applied Statistics, 7(2)
Acknowledgments Duke University Proteomics Core Facility
http://www.genome.duke.edu/cores/proteomics/
Funding NIH S10 grant
Duke School of Medicine CTSA grant UL1RR024128
Jay Johnson1, Giuseppe Astarita1, Giuseppe Paglia2, Jim Murphy2, Steven Cohen2,
Jim Langridge2, Geoff Gerhardt2
1Waters Corporation, Milford, MA; 1Center for Systems Biology, University of Iceland,
3Waters Corporation, Manchester, UK