Download ppt - Daniel C. Liebler Vanderbilt University School of Medicine Vanderbilt, Tennessee Performance and Optimization of LC-MS/MS Platforms for Unbiased Discovery

Daniel C. Liebler

Vanderbilt University School of MedicineVanderbilt, Tennessee

Performance and Optimization of LC-MS/MS Platforms for Unbiased Discovery of

Biomarker Candidates

http://proteomics.cancer.gov

Discovery• Tissue• Proximal

fluids

ClinicalValidation• Blood• Population

Verification• Blood• Population

Bio-Specimens• Plasma• Tissue• Proximal fluids

Found in blood?higher in cancer?

Biomarkers worthevaluating

Biomarkers worthevaluating

biomarkercandidates

A Functioning Pipeline for Cancer Biomarker Development Requires Both Discovery and Directed Assay Components

“hypotheses”• untargeted

proteomics• genomics

Overview of shotgun proteomics

proteins

peptides

AVAGCPGRSTYMAGGTVVKMSAVVGHLYK

NALGHTSSSPGGVPRIDGEEPGLVAR

QCCDGGVSWTKANDMANYMORE

digest

MS-MS data encode sequences

database search

y7

y6

y5

y4y3

y2

b2

b3 b4b5 b6

b7 b8

dm_200392_TpepC_std #587 RT: 20.55 AV: 1 NL: 1.95E7T: + c d Full ms2 [email protected] [ 95.00-790.00]

100 150 200 250 300 350 400 450 500 550 600

m/z

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

42

Re

lativ

e A

bu

nd

an

ce

605.3170.9

534.3 606.3

303.3374.2

143.0

402.2242.0 607.5

535.2171.9

477.0299.2 459.7197.0 615.3375.2 530.2169.8 517.3357.2246.2 445.1 587.4403.2304.1271.1143.9 232.5196.3 558.0400.0129.1

m/z

liquid chromatography-tandem mass

spectrometry (LC-MS/MS)

peptide fractionation

Shotgun proteomics for biomarker discovery

Challenges

• > 106 range of protein concentrations in tissues, biofluids

• no technology yet capable of complete sampling of proteomes

• multiple instrument systems used• variability in detection of proteins

No systematic studies of variation in shotgun proteomics

Impact of variation on performance for unbiased discovery unknown

cancernormal

shotgun proteomeanalysis

compare inventories,identify differences

•few (<20) samples or sample pools

•low throughput

•Identify ~5,000+ proteins

•inventory differences ~50-500+ proteins

HUPO Study

Key questions not addressed:

Is the technology inherently variable?

What are the sources of variation?

How reproducible are analyses of complex biological proteomes?

“A HUPO test sample study reveals common problems in mass spectrometry-based proteomics” Bell et al. Nature Methods (2009) 6: 423-430

20 protein mixture distributed to 27 labs; no standardized methods or data analysis

• only 7 labs correctly ID all 20 proteins

Coaching reanalyses common bioinformatics

• all 27 labs correctly ID “most” proteins

Conclusions:

• Variable performance within and between labs

• better databases and search engines needed, as is training in their use

CPTAC Unbiased Discovery Workgroup: Goals

Evaluate and standardize performance of proteomic discovery platforms and standardize

their use

• Identify and characterize sources of variation

• Develop quality control metrics

• Employ defined protein mixtures and biologically relevant proteomes

• Evaluate sensitivity for protein detection at defined levels of concentration

Discovery WG participants

National Cancer Institute- OTIR- OBBR- TTB- NCICB- DCEG- DCB- DCTD- DCP- CCR- OCTR (SPOREs)

Bethesda and Frederick, Md.

National Institute of Standards and TechnologyGaithersburg, Md.

Texas A&M UniversityCollege Station, Tex.

Argonne National LaboratoryArgonne, Ill.

Memorial Sloan-Kettering Cancer Center,New York, NY

Broad Institute of MIT and Harvard, Cambridge, Mass.

University of California at San Francisco, San Francisco, Calif.

Purdue University, West Lafayette, Ind.

Harvard-affiliated Hospitals,Boston, MA

Fred Hutchison Cancer Research Center, Seattle, Wash.

M.D. Anderson Cancer Center, Houston, Tex.

University of British Columbia & University of Victoria, Vancouver, British Columbia, Canada.

Plasma Proteome Institute, Washington, D.C.

New York University, New York, NY.

Buck Institute, Novato, Calif.

Lawrence Berkeley National Laboratory, Berkeley, Calif. Vanderbilt University

School of Medicine, Nashville, Tenn.

National Institute of Statistical SciencesResearch Triangle Park, NC

Indiana University, Predictive Physiology and Medicine, Inc. Bloomington, Ind.

Purdue U., Indiana U. Schools of Medicine, Indianapolis, Ind.

CPTAC Discovery WG studies:from simple to complex proteomes

samples

instruments variousinstruments

study 1 2 3 5 6

LTQ,LTQ-Orbitrap

SOP none v. 1.0 v. 2.0 v. 2.1 v. 2.2

NCI-20 yeast yeast w/ BSA spike

yeast w/ Sigma UPS spikesSigma UPS48 human protein mix

8

none

SOP refinement

Sample complexity

Equivalent to HUPO study

Repeatability and reproducibility are robust across laboratories

Run-to-run repeatability in analyses of yeast across 7 instruments/sites

65-80% of protein IDs repeat between runs on individual instruments

Interlaboratory reproducibility

65-70% of protein IDs are reproduced across multiple instruments

proteins

peptides

proteins

peptides

% p

eptid

es/p

rote

ins

sha

red

betw

een

runs

Study 6 (yeast)

Study 6, Study 8 (yeast)

% p

eptid

es/p

rote

ins

sha

red

betw

een

runs

Saccharomyces cerevisiae proteome reference material

• Complex protein matrix (6,000 ORFs)

• Reproducible preparation possible

• Quantitative TAP tag studies (Ghaemmaghami, S. et al. (2003) Nature 425, 737-741) provide calibration for expression levels

• Statistical power for modeling depth of coverage by proteomics platforms

• Use with human protein spikes enables modeling for biomarker discovery applications

• Made available to proteomics community (NIST)

Ghaemmaghami, S. et al. (2003) Nature 425, 737-741

Modeling detection of human protein “biomarker” spikes in yeast matrix

•48 human protein (UPS) spikes in yeast proteome background

•“simple” discovery system distinguishes differences equivalent to 0.5 – 1.3% of proteome

System map of LC-MS performance metrics

Paul RudnickSteve Stein(NIST)

Chromatography(11 metrics)

•peptide resolution•peak widths•elution order

Mass Spectrometer

Ion Source(6 metrics)

•ion intensities•electrospray

stability

MS instrument(20 metrics)

•MS1 and MS2 signal characteristics•dynamic sampling

MS scans

MS/MS scans

Autosampler and Pump

Peptide Identification(5 metrics)

•numbers of identifications•search scores

Computer-automated interpretation of spectra

Peptide Identifications

LC column

• Over 40 performance metrics monitored

Diagnosing and correcting system malfunction

CPTACstudy 5

Peptide Identification

Run

0 2 4 6 8 10 12 14 16 18 20 22

Met

ric V

alue

0

1000

2000

3000

4000

5000

P-1 (med. f-value score for IDs) *1,000 P-2A (total IDs) P-2B (unique ion IDs) P-2C (unique peptide IDs)

a

c

LTQ@73 LTQ@95 LTQ@95-rep.

Examining Early Eluting Peptides Across Laboratories (C-6A)

Ave

rage

Fra

ctio

nal E

xces

s/D

efic

it

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

Early Within Lab Early Between Labs

Chromatography

Run

0 2 4 6 8 10 12 14 16 18 20 22

Met

ric V

alue

0

10

20

30

40

50

60

70

C-1A ('bleed' -4 min.) %*100 C-1B ('bleed' +4 min.) % *100 C-2A middle 50% pep RT period C-2B rate (peptides/min.) over C-2A C-3A med. peak width C-3B (disperson for peak widths)

b


Peptide Identification

Run

0 2 4 6 8 10 12 14 16 18 20 22

Met

ric V

alue

0

1000

2000

3000

4000

5000

P-1 (med. f-value score for IDs) *1,000 P-2A (total IDs) P-2B (unique ion IDs) P-2C (unique peptide IDs)

a

c


Examining Early Eluting Peptides Across Laboratories (C-6A)

Ave

rage

Fra

ctio

nal E

xces

s/D

efic

it

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

Early Within Lab Early Between Labs

Chromatography

Run

0 2 4 6 8 10 12 14 16 18 20 22

Met

ric V

alue

0

10

20

30

40

50

60

70

C-1A ('bleed' -4 min.) %*100 C-1B ('bleed' +4 min.) % *100 C-2A middle 50% pep RT period C-2B rate (peptides/min.) over C-2A C-3A med. peak width C-3B (disperson for peak widths)

b


CPTAC Unbiased Discovery Workgroup: Key Achievements

1. First systematic, SOP-driven study of LC-MS/MS analytical systems across multiple laboratories

2. Quantitative assessment of repeatability and reproducibility in peptide vs. protein detection

3. Yeast reference proteome standard and accompanying datasets

4. Yeast reference proteome with spikes enables quantitative modeling of power to discover biomarker candidates

5. Performance metrics and software (“toolkit”) to monitor and troubleshoot system performance

Next steps for Discovery WG

Evaluate performance of platforms to discriminate between cancer-relevant phenotypes• Phase II Studies

• Human breast cancer cell model; responses to TKI

• Compare commonly employed quantitative methods for survey of differences

• Phase III studies• Human tumor tissue specimens corresponding to defined

clinical phenotypes

• Evaluate phenotype discrimination

• Implement methods, metrics and approaches developed in Phase I, Phase II studies

Backups

Why care about reproducibility in discovery proteomics?

1. Biomarker candidates come from comparing proteomes from different phenotypes

2. Need to know whether observed differences are due to biology or to variability in the analytical system.

1. Biomarker candidates come from comparing proteomes from different phenotypes

2. Need to know whether observed differences are due to biology or to variability in the analytical system.

biomarkercandidates

cancernormal

shotgun proteomeanalysis

compare inventories,identify differences

Yeast proteome enables calibration and comparison of detection efficiency

CN50 = copy number with 50% detection probability

Metrics identify the greatest sources of variability

CPTAC Study5 Intralaboratory Variability3LTQs, 3 Orbitraps, 6 replicates each

Median Intralab %dev

0 5 10 15 20 25 30 35

C-3A (med. peak width) C-2A (IQ pep. RT period)

C-2B (peptides/min.) C-3B (Interquartile for peak widths)

C-1A ('bleed' -4 min.) /10 C-1B ('bleed' +4 min.) /10

DS-3B (med. MS1max/MS1sampled for bottom 50% by abund.) DS-3A (med. MS1max/MS1sampled all IDs)

DS-2B (MS2 Scans over C-2A) DS-2A (MS1 Scans over C-2A)

DS-1A (oversampling - once/twice) DS-1B (oversampling - twice/thrice)

IS-2 (med. precursor m/z) IS-3A (ratio IDs +1/+2) IS-3B (ratio IDs +3/+2) IS-3C (ratio IDs +4/+2)

MS1-2A (S-N) MS1-1 (ion injection (ms) for IDs)

MS1-2B (med. TIC-1e3 over C-2A) MS1-3B (med. MS1 signal for IDs)

MS1-3A (dynamic range 95th-5th for IDs) MS2-1 (Ion injection (ms) for IDs)

MS2-4A (fract. ID'd Q1) MS2-3 (med. num peaks for IDs)

MS2-2 (S-N for IDs) MS2-4B (fract. ID'd Q2) MS2-4C (fract. ID'd Q3) MS2-4D (fract. ID'd Q4)

P-1 (med. f-value score for IDs)P-2C (unique peptide IDs)

P-2BP-2A

Chromatography Dynamic Sampling Ion Source MS1 MS2 Peptide Identification

CPTAC Study5 Interlaboratory VariabilityLTQs

Interlaboratory %dev

0 20 40 60 80 100 120 140 160

C-1A ('bleed' -4 min.) -10 C-1B ('bleed' +4 min.) -10 C-2A (IQ pep. RT period)

C-2B (peptides-min.) C-3A (med. peak width)

C-3B (IQ for peak widths) DS-1A (oversampling - once-twice) DS-1B (oversampling - twice/thrice)

DS-2A (MS1 Scans over C-2A) DS-2B (MS2 Scans over C-2A)

DS-3A (med. MS1max/MS1sampled all IDs) DS-3B (med. MS1max/MS1sampled for bottom 50% by abund.)

IS-2 (med. precursor m/z) IS-3A (ratio IDs +1-+2) IS-3B (ratio IDs +3-+2) IS-3C (ratio IDs +4-+2)

MS1-1 (ion injection (ms) for IDs) MS1-2A (S-N)

MS1-2B (med. TIC-1e3 over C-2A) MS1-3A (dynamic range 95th-5th for IDs)

MS1-3B (med. MS1 signal for IDs) MS2-1 (Ion injection (ms) for IDs)

MS2-4A (fract. ID'd Q1) MS2-4B (fract. ID'd Q2)

MS2-3 (med. num peaks for IDs) MS2-4C (fract. ID'd Q3) MS2-4D (fract. ID'd Q4)

MS2-2 (S-N for IDs)P-1 (med. f-value score for IDs)

P-2C (unique peptide IDs)P-2BP-2A

CPTAC Study5 Interlaboratory VariabilityOrbitraps

Interlaboratory %dev

0 20 40 60 80 100 120 140 160

a

b c