Validation and Portability of Unbiased, Label-free Proteomics · 2020-06-03 · Validation and Portability of Unbiased, Label-free Proteomics. Joe Lucas1,2 , Will Thompson 1, Laura

Validation and Portability of Unbiased, Label-free Proteomics

Joe Lucas1,2 , Will Thompson1, Laura Dubois1,

Keyur Patel1, Arthur Moseley1

Duke University School of Medicine, Durham, NC1

Quintiles, Morrisville, NC2

AMIA Disclosure

• The authors declare that there are no relevant financial relationships relating to the disease state or drug treatment discussed in this presentation

Biomarker Discovery, Verification, Applications in Multiple Labs

Number of Analytes

Number of Samples

10,000s

10s

100-1,000

100 -1,000

10s

1,000s

Biomarker Validation

Biomarker Discovery

Biomarker Verification

Open Platform LC/MS LC/MS/MS (MRM) Antibody-based Assays

Antibody-based Assays LC/MS/MS (MRM)

D. Butler Nature. 2008;453:840–2.

Basic Research – Small scale studies – Highly collaborative;

flexible deliverables – New technologies can

be tested – Hypothesis generation

is the key goal

Clinical Proteomics – Large clinical-based

studies – Deliverables & timelines

are well-defined – Requires very robust

technology that can be run across hundreds of samples & in multiple labs

– QC metrics must be tightly controlled

Towards the Development of a Comprehensive Experimental Design for Clinical Proteomics

• Insure a clear and realistic definition of the goals of the experiment

• Have available a comprehensive set of hardware and software tools

• Rigorously use quantitatively reproducible analytical methods

• Ideally, start with a mechanistic understanding of the biology of the disease – Insure you are looking in the right place at the right time

• Insure availability of a large cohort of well curated clinical samples – Initial selection of matched sub-cohort for discovery experiments – Extension from biomarker discovery to biomarker verification in “all comers” trials

• Insure professional use of statistical tools suitable for high dimensional data analysis

Exemplar Biomarker Discovery & Verification Project

Biomarkers to Predict Outcomes of Hepatitis C Patient Treatment in Serum of Treatment Naive Patients

Jeanette McCarthy, Keyur Patel, Joe Lucas and John McHutchison

Spontaneous clearance (~25%)

20% cirrhosis

3-5% cancer

Chronic infection

Eligible for Treatment

Responders Non-responders (>50%)

Hepatic Fibrosis Steatosis Insulin resistance Dyslipidemia

Increased risk of diabetes

Unknown consequences

Duke Hepatology Biorepository - 3,169 patients Discovery Cohort - small discovery experiment - well matched cohort from Biorepository - n = 55 patients - ‘omic LC/MS/MS

Biomarker Discovery Paradigm Challenge Hepatitis C Cohorts for Discovery and Verification

Insure Availability of a Large Cohort of Well Curated Clinical Samples

Open Platform LC/MS LC/MS/MS (MRM) LC/MS/MS (MRM)

Verification Cohort 2 - pediatric patients - “all-comers” trial - N = 50 patients

Verification Cohort 1 - well matched cohort from Biorepository - n = 41 patients

Verification / Validation Cohort 3 - “all-comers” trial (Australia) - N = 243 patients

• Protein and Peptide Separations – Four Waters Nanoscale UPLC – Two Waters Nanoscale UPLC/UPLC – One Acquity UPLC

• ‘Omic Qualitative and Quantitative Biomarker Discovery – Five high resolution, accurate mass, tandem mass spectrometers

• One hybrid quadrupole / time-of-flight systems – Waters Q-Tof Ultima

• Three hybrid quadrupole/ion-mobility/time-of-flight tandem mass spectrometer – Waters Synapt G1 HDMS – Waters Synapt G2 HDMS with ETD

• One hybrid LTQ / Orbitrap system – Thermo LTQ-Orbitrap

• Targeted Peptide and Protein Quantitation – One triple quadrupole tandem mass spectrometer

• Waters Xevo TQ-S

Waters UPLCs Four 1D systems Two 2D systems

Waters Synapt G1 HDMS Waters Synapt G2 HDMS (x2)

Waters Q-Tof Ultima

Thermo LTQ-Orbitrap (HHMI owned)

Waters Xevo Triple Quad-S

Duke ‘Omic Hardware Toolkit

Advion NanoMate

OFFGEL pI Fractionation

GELFREE MW Fractionation

Duke Proteomics Software Toolkit

Acquiring data is (relatively) easy, acquiring knowledge is hard • Qualitative Analyses

– Data Dependant Acquisition Database Searching – Matrix Science Mascot Software – runs across 40 processor equivalents

– Automated Processing Pipeline with Mascot Demon, Mascot Distiller, and Mascot Server – Dell Blade Cluster - 32 processor equivalents

– Data Independent Database Searching • Waters PLGS/IdentityE Software • Two ‘Home Brew’ Super-Computers - 1,000 X Cray-1 speed

– Data Visualization Software (data return to customers) • Proteome Software Scaffold

• Quantitative and Qualitative Analyses – Rosetta Elucidator Software

• Data processing, data statistics, data visualizations – Dell Server R900 (largest single server at Duke)

• 4 Quad Core Processors - 128 GB of RAM – Rosetta Oracle DB running on dedicated server

• Pathway Analyses – Ingenuity Pathway Analyses

• Data Storage – 72 Terabytes of NetApps Enterprise Quality Storage

– data mirrors for data security – ~50 TB ‘Cold’ Data Storage on DROBOs – Transitioning to Amazon Glacier

Dell R-900 Server

NetApps Data Storage

Dell Blade Server

Scaffold

Mascot

Elucidator

Oracle DB

Ingenuity Pathway Analysis

Label-Free Quantitation Strategy - flexibility to fit clinical study design

Image Translation

Retention Time Alignment

Intensity Normalization

m/z

Cohort 1

Cohort 3

Cohort 2

Cohort 4

Retention Time

… (n)

Retention Time

Retention Time

m/z

Retention Time

m/z

Master Image

Retention Time In

tens

ity

Individual Feature

9

“Discovery” Data Collection QC (High Quality Robust Instrumentation is only the Starting Point)

• Daily System Suitability Standard • Surrogate Standard Spike for Each Sample • QC Injection of Pooled Sample at Regular Interval • Real-time and post-acquisition QC metrics

Insure Professional Use of Statistical Tools Suitable for High Dimensional Data Analyses

Sparse Latent Factor Regression - Bayesian Factor Regression Modeling

35,000 Isotope Groups Predictive Factor “Metaproteins”

Factor Score “Expression Value”

Statistical Analysis: Joe Lucas, PhD,

Duke IGSP

• Regression - Leads directly to prediction • Sparsity – Most peptides are irrelevant for prediction • Latent Factors – let data determine important relationships • Resulting model for prediction:

• Initial Metaprotein Model - 650 Isotope Groups

Pastor Thomas Bayes

Bayesian Factor Regression Modeling Joe Lucas, “Metaprotein” Expression Analysis

Metaprotein Analysis -Groups peptides based on identification and/or coexpression -Casts a wide net -Discordant Peptides are avoided -Includes coexpressing peptides from additional proteins -Captures “Pathway” Expression -Regression for Prediction

Modified Verification Strategy using ‘omic Methodology

Performance of a 261 Peptide Model - Cross Validation to Second Cohort (41 patients)

Sparse Latent Factor Regression Bayesian Factor Regression Modeling

Using only proteomic parameters: 21 of 26 SVR patients were correctly predicted 12 of 15 non-responders were correctly predicted Predictor modeling will be improved by: - Increasing number of identified peptides - Improving data alignment across projects

Discovery and Initial Verification of SVR-Prediction Using Unbiased Data

Discovery Data, Build Model

Use Model to Predict SVR (Blinded)

Reproducibility of Metaprotein Biosignatures

• Build predictive model with first three cohorts • Predict NR / SVR in “Big Pharma” measured data

– different LC/MS/MS (LTQ-Orbi) system in different lab – Metaprotein model maintained consistent results

Discovery Cohort N = 55

Matched

Discovery Cohort Measured by

Big Pharma Lab

Verification Cohort 1

N = 41 Matched

Verification Cohort 2

N = 50 All-Comers

How to Create Method for Routine use in Multiple Labs

Number of Analytes

Number of Samples

10,000s

10s

100-1,000

100 -1,000

10s

1,000s

Biomarker Validation

Biomarker Discovery

Biomarker Verification

Open Platform LC/MS LC/MS/MS (MRM) Antibody-based Assays

Antibody-based Assays LC/MS/MS (MRM)

D. Butler Nature. 2008;453:840–2.

Transition from Biomarker Discovery to Biomarker Verification

Nature Methods “Method of the Year” nature methods | VOL.10 NO.1 | JANUARY 2013 | 19

“What I like about targeted proteomics is that you answer the question that you are interested in,” says Michael MacCoss of the University of Washington…. Rather than trying to detect all the proteins in a mixture, as in a discovery-based approach, a targeted approach “lets us build quantitative assays to specifically answer hypothesis driven questions,”

DPCF Deployment of Targeted Proteomics - biomarker verification in clinical cohort of 124 patients

Same method for three matrices 147 endogenous peptides 147 stable label peptides 6 non-endogenous control peptides 3 transitions/peptide 900 MRM transitions monitored - these numbers are not a limitation of the instrument

HCV Translation to SRM/MRM Platform

SRM Protein Thbg CO2 CO5 CO5 CO5 ITIH2 ITIH2 ITIH1 HRG HRG

Chariot

Peds C HCV - 41 Initial 55

P-value = 1.6x10-5

Use all three cohorts to choose targets and train the model

Chariot Study • 243 samples • 191 standard of care

• None of the others became SVR

• 117 SVR, 32 NR • Train model on three original

cohorts, 10 SRM peptides only

Work to Date • D. Cyr, Joseph E. Lucas, J. W. Thompson, K. Patel,

P. J. Clark, A. Thompson, H. L. Tillmann, J. G. McHutchison, M. A. Moseley, J. J. McCarthy (2011) “Characterization of serum proteins associated with IL28B genotype among patients with chronic Hepatitis C”, PLoS ONE 6(7): e21854. doi:10.1371/journal.pone.0021854

• K. Patel, Joseph E. Lucas, J. W. Thompson, L. G.

Dubois, H. Tillman, A. Thompson, D. Uzarski, R. M. Califf, M. A. Moseley, G. S. Ginsburg, J. G. McHutchison and J. J. McCarthy (2011) “High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C Patients”, Hepatology, 53(6), 1809-1818, doi:10.1002/hep.24284

Technical Work – Factor Model • Joseph E. Lucas, J. Will Thompson, Laura

G. Dubois, Jeanette McCarthy, Keyur Patel, Hans Tillman, Alex Thompson, John McHutchison and M. Arthur Moseley (2011), “Metaprotein Expression Modeling for Label-free Quantitative Proteomics”

• Ricardo Henao, J. Will Thompson, M. Arthur Moseley, Geoffrey S. Ginsburg, Lawrence Carin, and Joseph E. Lucas (2013) “Latent protein trees”, Annals of Applied Statistics, 7(2)

Acknowledgments Duke University Proteomics Core Facility

http://www.genome.duke.edu/cores/proteomics/

Funding NIH S10 grant

Duke School of Medicine CTSA grant UL1RR024128

Jay Johnson1, Giuseppe Astarita1, Giuseppe Paglia2, Jim Murphy2, Steven Cohen2,

Jim Langridge2, Geoff Gerhardt2

1Waters Corporation, Milford, MA; 1Center for Systems Biology, University of Iceland,

3Waters Corporation, Manchester, UK

Documents

Validation and Portability of Unbiased, Label-free Proteomics · 2020-06-03 · Validation and Portability of Unbiased, Label-free Proteomics. Joe Lucas1,2 , Will Thompson 1, Laura