Daniel C. Liebler
Vanderbilt University School of MedicineVanderbilt, Tennessee
Performance and Optimization of LC-MS/MS Platforms for Unbiased Discovery of
Biomarker Candidates
http://proteomics.cancer.gov
Discovery• Tissue• Proximal
fluids
ClinicalValidation• Blood• Population
Verification• Blood• Population
Bio-Specimens• Plasma• Tissue• Proximal fluids
Found in blood?higher in cancer?
Biomarkers worthevaluating
Biomarkers worthevaluating
biomarkercandidates
A Functioning Pipeline for Cancer Biomarker Development Requires Both Discovery and Directed Assay Components
“hypotheses”• untargeted
proteomics• genomics
Overview of shotgun proteomics
proteins
peptides
AVAGCPGRSTYMAGGTVVKMSAVVGHLYK
NALGHTSSSPGGVPRIDGEEPGLVAR
QCCDGGVSWTKANDMANYMORE
digest
MS-MS data encode sequences
database search
y7
y6
y5
y4y3
y2
b2
b3 b4b5 b6
b7 b8
dm_200392_TpepC_std #587 RT: 20.55 AV: 1 NL: 1.95E7T: + c d Full ms2 [email protected] [ 95.00-790.00]
100 150 200 250 300 350 400 450 500 550 600
m/z
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
Re
lativ
e A
bu
nd
an
ce
605.3170.9
534.3 606.3
303.3374.2
143.0
402.2242.0 607.5
535.2171.9
477.0299.2 459.7197.0 615.3375.2 530.2169.8 517.3357.2246.2 445.1 587.4403.2304.1271.1143.9 232.5196.3 558.0400.0129.1
m/z
liquid chromatography-tandem mass
spectrometry (LC-MS/MS)
peptide fractionation
Shotgun proteomics for biomarker discovery
Challenges
• > 106 range of protein concentrations in tissues, biofluids
• no technology yet capable of complete sampling of proteomes
• multiple instrument systems used• variability in detection of proteins
No systematic studies of variation in shotgun proteomics
Impact of variation on performance for unbiased discovery unknown
cancernormal
shotgun proteomeanalysis
compare inventories,identify differences
•few (<20) samples or sample pools
•low throughput
•Identify ~5,000+ proteins
•inventory differences ~50-500+ proteins
HUPO Study
Key questions not addressed:
Is the technology inherently variable?
What are the sources of variation?
How reproducible are analyses of complex biological proteomes?
“A HUPO test sample study reveals common problems in mass spectrometry-based proteomics” Bell et al. Nature Methods (2009) 6: 423-430
20 protein mixture distributed to 27 labs; no standardized methods or data analysis
• only 7 labs correctly ID all 20 proteins
Coaching reanalyses common bioinformatics
• all 27 labs correctly ID “most” proteins
Conclusions:
• Variable performance within and between labs
• better databases and search engines needed, as is training in their use
CPTAC Unbiased Discovery Workgroup: Goals
Evaluate and standardize performance of proteomic discovery platforms and standardize
their use
• Identify and characterize sources of variation
• Develop quality control metrics
• Employ defined protein mixtures and biologically relevant proteomes
• Evaluate sensitivity for protein detection at defined levels of concentration
Discovery WG participants
National Cancer Institute- OTIR- OBBR- TTB- NCICB- DCEG- DCB- DCTD- DCP- CCR- OCTR (SPOREs)
Bethesda and Frederick, Md.
National Institute of Standards and TechnologyGaithersburg, Md.
Texas A&M UniversityCollege Station, Tex.
Argonne National LaboratoryArgonne, Ill.
Memorial Sloan-Kettering Cancer Center,New York, NY
Broad Institute of MIT and Harvard, Cambridge, Mass.
University of California at San Francisco, San Francisco, Calif.
Purdue University, West Lafayette, Ind.
Harvard-affiliated Hospitals,Boston, MA
Fred Hutchison Cancer Research Center, Seattle, Wash.
M.D. Anderson Cancer Center, Houston, Tex.
University of British Columbia & University of Victoria, Vancouver, British Columbia, Canada.
Plasma Proteome Institute, Washington, D.C.
New York University, New York, NY.
Buck Institute, Novato, Calif.
Lawrence Berkeley National Laboratory, Berkeley, Calif. Vanderbilt University
School of Medicine, Nashville, Tenn.
National Institute of Statistical SciencesResearch Triangle Park, NC
Indiana University, Predictive Physiology and Medicine, Inc. Bloomington, Ind.
Purdue U., Indiana U. Schools of Medicine, Indianapolis, Ind.
CPTAC Discovery WG studies:from simple to complex proteomes
samples
instruments variousinstruments
study 1 2 3 5 6
LTQ,LTQ-Orbitrap
SOP none v. 1.0 v. 2.0 v. 2.1 v. 2.2
NCI-20 yeast yeast w/ BSA spike
yeast w/ Sigma UPS spikesSigma UPS48 human protein mix
8
none
SOP refinement
Sample complexity
Equivalent to HUPO study
Repeatability and reproducibility are robust across laboratories
Run-to-run repeatability in analyses of yeast across 7 instruments/sites
65-80% of protein IDs repeat between runs on individual instruments
Interlaboratory reproducibility
65-70% of protein IDs are reproduced across multiple instruments
proteins
peptides
proteins
peptides
% p
eptid
es/p
rote
ins
sha
red
betw
een
runs
Study 6 (yeast)
Study 6, Study 8 (yeast)
% p
eptid
es/p
rote
ins
sha
red
betw
een
runs
Saccharomyces cerevisiae proteome reference material
• Complex protein matrix (6,000 ORFs)
• Reproducible preparation possible
• Quantitative TAP tag studies (Ghaemmaghami, S. et al. (2003) Nature 425, 737-741) provide calibration for expression levels
• Statistical power for modeling depth of coverage by proteomics platforms
• Use with human protein spikes enables modeling for biomarker discovery applications
• Made available to proteomics community (NIST)
Ghaemmaghami, S. et al. (2003) Nature 425, 737-741
Modeling detection of human protein “biomarker” spikes in yeast matrix
•48 human protein (UPS) spikes in yeast proteome background
•“simple” discovery system distinguishes differences equivalent to 0.5 – 1.3% of proteome
System map of LC-MS performance metrics
Paul RudnickSteve Stein(NIST)
Chromatography(11 metrics)
•peptide resolution•peak widths•elution order
Mass Spectrometer
Ion Source(6 metrics)
•ion intensities•electrospray
stability
MS instrument(20 metrics)
•MS1 and MS2 signal characteristics•dynamic sampling
MS scans
MS/MS scans
Autosampler and Pump
Peptide Identification(5 metrics)
•numbers of identifications•search scores
Computer-automated interpretation of spectra
Peptide Identifications
LC column
• Over 40 performance metrics monitored
Diagnosing and correcting system malfunction
CPTACstudy 5
Peptide Identification
Run
0 2 4 6 8 10 12 14 16 18 20 22
Met
ric V
alue
0
1000
2000
3000
4000
5000
P-1 (med. f-value score for IDs) *1,000 P-2A (total IDs) P-2B (unique ion IDs) P-2C (unique peptide IDs)
a
c
LTQ@73 LTQ@95 LTQ@95-rep.
Examining Early Eluting Peptides Across Laboratories (C-6A)
Ave
rage
Fra
ctio
nal E
xces
s/D
efic
it
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Early Within Lab Early Between Labs
Chromatography
Run
0 2 4 6 8 10 12 14 16 18 20 22
Met
ric V
alue
0
10
20
30
40
50
60
70
C-1A ('bleed' -4 min.) %*100 C-1B ('bleed' +4 min.) % *100 C-2A middle 50% pep RT period C-2B rate (peptides/min.) over C-2A C-3A med. peak width C-3B (disperson for peak widths)
b
LTQ@73 LTQ@95 LTQ@95-rep.
Peptide Identification
Run
0 2 4 6 8 10 12 14 16 18 20 22
Met
ric V
alue
0
1000
2000
3000
4000
5000
P-1 (med. f-value score for IDs) *1,000 P-2A (total IDs) P-2B (unique ion IDs) P-2C (unique peptide IDs)
a
c
LTQ@73 LTQ@95 LTQ@95-rep.
Examining Early Eluting Peptides Across Laboratories (C-6A)
Ave
rage
Fra
ctio
nal E
xces
s/D
efic
it
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Early Within Lab Early Between Labs
Chromatography
Run
0 2 4 6 8 10 12 14 16 18 20 22
Met
ric V
alue
0
10
20
30
40
50
60
70
C-1A ('bleed' -4 min.) %*100 C-1B ('bleed' +4 min.) % *100 C-2A middle 50% pep RT period C-2B rate (peptides/min.) over C-2A C-3A med. peak width C-3B (disperson for peak widths)
b
LTQ@73 LTQ@95 LTQ@95-rep.
CPTAC Unbiased Discovery Workgroup: Key Achievements
1. First systematic, SOP-driven study of LC-MS/MS analytical systems across multiple laboratories
2. Quantitative assessment of repeatability and reproducibility in peptide vs. protein detection
3. Yeast reference proteome standard and accompanying datasets
4. Yeast reference proteome with spikes enables quantitative modeling of power to discover biomarker candidates
5. Performance metrics and software (“toolkit”) to monitor and troubleshoot system performance
Next steps for Discovery WG
Evaluate performance of platforms to discriminate between cancer-relevant phenotypes• Phase II Studies
• Human breast cancer cell model; responses to TKI
• Compare commonly employed quantitative methods for survey of differences
• Phase III studies• Human tumor tissue specimens corresponding to defined
clinical phenotypes
• Evaluate phenotype discrimination
• Implement methods, metrics and approaches developed in Phase I, Phase II studies
Backups
Why care about reproducibility in discovery proteomics?
1. Biomarker candidates come from comparing proteomes from different phenotypes
2. Need to know whether observed differences are due to biology or to variability in the analytical system.
1. Biomarker candidates come from comparing proteomes from different phenotypes
2. Need to know whether observed differences are due to biology or to variability in the analytical system.
biomarkercandidates
cancernormal
shotgun proteomeanalysis
compare inventories,identify differences
Yeast proteome enables calibration and comparison of detection efficiency
CN50 = copy number with 50% detection probability
Metrics identify the greatest sources of variability
CPTAC Study5 Intralaboratory Variability3LTQs, 3 Orbitraps, 6 replicates each
Median Intralab %dev
0 5 10 15 20 25 30 35
C-3A (med. peak width) C-2A (IQ pep. RT period)
C-2B (peptides/min.) C-3B (Interquartile for peak widths)
C-1A ('bleed' -4 min.) /10 C-1B ('bleed' +4 min.) /10
DS-3B (med. MS1max/MS1sampled for bottom 50% by abund.) DS-3A (med. MS1max/MS1sampled all IDs)
DS-2B (MS2 Scans over C-2A) DS-2A (MS1 Scans over C-2A)
DS-1A (oversampling - once/twice) DS-1B (oversampling - twice/thrice)
IS-2 (med. precursor m/z) IS-3A (ratio IDs +1/+2) IS-3B (ratio IDs +3/+2) IS-3C (ratio IDs +4/+2)
MS1-2A (S-N) MS1-1 (ion injection (ms) for IDs)
MS1-2B (med. TIC-1e3 over C-2A) MS1-3B (med. MS1 signal for IDs)
MS1-3A (dynamic range 95th-5th for IDs) MS2-1 (Ion injection (ms) for IDs)
MS2-4A (fract. ID'd Q1) MS2-3 (med. num peaks for IDs)
MS2-2 (S-N for IDs) MS2-4B (fract. ID'd Q2) MS2-4C (fract. ID'd Q3) MS2-4D (fract. ID'd Q4)
P-1 (med. f-value score for IDs)P-2C (unique peptide IDs)
P-2BP-2A
Chromatography Dynamic Sampling Ion Source MS1 MS2 Peptide Identification
CPTAC Study5 Interlaboratory VariabilityLTQs
Interlaboratory %dev
0 20 40 60 80 100 120 140 160
C-1A ('bleed' -4 min.) -10 C-1B ('bleed' +4 min.) -10 C-2A (IQ pep. RT period)
C-2B (peptides-min.) C-3A (med. peak width)
C-3B (IQ for peak widths) DS-1A (oversampling - once-twice) DS-1B (oversampling - twice/thrice)
DS-2A (MS1 Scans over C-2A) DS-2B (MS2 Scans over C-2A)
DS-3A (med. MS1max/MS1sampled all IDs) DS-3B (med. MS1max/MS1sampled for bottom 50% by abund.)
IS-2 (med. precursor m/z) IS-3A (ratio IDs +1-+2) IS-3B (ratio IDs +3-+2) IS-3C (ratio IDs +4-+2)
MS1-1 (ion injection (ms) for IDs) MS1-2A (S-N)
MS1-2B (med. TIC-1e3 over C-2A) MS1-3A (dynamic range 95th-5th for IDs)
MS1-3B (med. MS1 signal for IDs) MS2-1 (Ion injection (ms) for IDs)
MS2-4A (fract. ID'd Q1) MS2-4B (fract. ID'd Q2)
MS2-3 (med. num peaks for IDs) MS2-4C (fract. ID'd Q3) MS2-4D (fract. ID'd Q4)
MS2-2 (S-N for IDs)P-1 (med. f-value score for IDs)
P-2C (unique peptide IDs)P-2BP-2A
CPTAC Study5 Interlaboratory VariabilityOrbitraps
Interlaboratory %dev
0 20 40 60 80 100 120 140 160
a
b c