Transcript
Page 1: The 2012-2013 ABRF Proteomics Research Group study ... · 0 500 1000 1500 2000 Number of identified spectra from top twelve bovine proteins Conclusions. Preliminary conclusions indicate

0

500

1000

1500

2000

Number of identified spectra from top twelve bovine proteins

Conclusions.

Preliminary conclusions indicate that this study has provided a relatively large resource for assessing the amount of variability across

datasets within and between laboratories. This repository of data was collected over a 9 month period from more than 50 laboratories

and contains mass spectrometry data, information on instrument (both mass spectrometer and liquid chromatography system) operation

and performance, workflows, and operator experience. The survey data indicates that most users participating in this study have

significant operating experience, however some laboratories have much less longitudinal variation than others. Continued data analyses

may reveal factors which contribute most significantly to both greater and lesser amounts of variation within a laboratory. This information

could prove useful for the design and recommendation of best practices for proteomics analyses.

The 2012-2013 ABRF Proteomics Research Group study: Assessing longitudinal variability in

routine peptide LC-MS/MS analysis.

Maureen Bunger1; Tracy Andacht2; Keiryn Bennett3; Cory Bystrom4; Matthew Chambers5; Larry Dangott6; Felix Elortza7; John Leszyk8; Henrik Molina9; Robert Moritz10; Brett Phinney11; David Tabb 5; J. Will Thompson12; Xia Wang13; Jason Williams14 1Proteovations, LLC, RTP, NC; 2Centers for Disease Control and Prevention, Atlanta, GA; 3CeMM Research Center for Molecular Medicine, Vienna, Austria; 4Cleveland HeartLab, Inc., Akron, OH; 5Vanderbilt University, Nashville , TN; 6Texas A&M University, College Station, TX; 7Centro de Investigacion Cooperativa en Biociencias,

Bilbao, Spain; 8University of Massachusetts, Shrewsbury, MA; 9The Rockefeller University, New York, NY;10Institute for Systems Biology, Seattle, WA; 11University of California-Davis, Davis, CA; 12Duke University, Durham, NC; 13University of Cincinnati, Cincinnati, OH; 14National Institute of Environmental Health Sciences, Research Triangle Park, NC

Introduction. Many factors contribute variability in LC-MS/MS identification of complex peptide mixtures, and yet few studies have characterized the stability of

performance among many laboratories. The PRG study for 2012-2013 collected LC-MS/MS data sets spanning sixty-four participants across nine

months for a common digested protein mixture, with the goal of recognizing key sources of variability in HPLC and MS performance through QC

metrics. No standard operating protocol was imposed on participants; instead, participants employed methods that were typical for their

laboratories. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition

settings, and routine and non-routine maintenance procedures.

Methods. Participants were provided with nine vials of Michrom 6 Bovine Tryptic Digest Equal Molar Mix. At monthly intervals, participants uploaded raw data

files from data-dependent LC-MS/MS of these peptides. Waters data were translated to mzML using the Protein Lynx Global Server, and AB SCIEX

data were converted with the AB SCIEX MS Data Converter; all other files were translated using ProteoWizard msConvert (see Figure 1). The

mzML files for Waters instruments were incompatible with downstream tools. MyriMatch provided semi-tryptic database search identifications

against the RefSeq bovine proteome. IDPicker 3 filtered identifications and conducted parsimonious protein assembly. QuaMeter was run in

identification-dependent and identification-independent modes for metric computation (http://fenchurch.mc.vanderbilt.edu). The “IDFree” mode

generates 40 metrics for extracted ion chromatograms, retention time, mass spectrometry, and tandem mass spectrometry characterization. The

“NIST” mode generates metrics modeled after those published by Rudnick for the CPTAC consortium in 2009

(http://dx.doi.org/10.1074/mcp.M900223-MCP200). The R Statistical Environment generated robust principal component analysis and data

visualizations to guide evaluation of the data sets.

Results. Sixty-four participants contributed at least one data file, and forty-two uploaded at least eight experiments, yielding a large repository of raw data

files collected longitudinally within and across laboratories. Instruments included multiple architectures from AB SCIEX, Agilent, Bruker, Thermo,

and Waters. No effort was made to standardize LC-MS conditions between laboratories, and consequently LC-MS/MS gradients ranged from 22 to

160 minutes, and MS/MS acquisition ranged from less than 100 to nearly 30,000 tandem mass spectra per experiment. As a result, the number of

identifications produced from data sets also ranged widely (see bottom image). Low-concentration bovine proteins were frequently identified in

addition to the anticipated six proteins, including superoxide dismutase and alpha-S2-casein. PCA based on identification-independent quality

metrics visualized all of these participants in a single plane and demonstrated that for most participants, submitted experiments clustered together.

In some cases, however, the LC-MS/MS experiments were broadly dispersed, revealing significant variation in the data. These dispersions were

evaluated in light of the survey results provided by participants, reporting major tuning and calibration events along with method changes.

Once a day, 5

Once a month, 1

Once a week, 4

Once every few

days, 6

Other, 5

Several times per

day, 15

How Often Do You Perform Quality Control Tests?

Figure 3. Pie chart summarizing participant responses

regarding frequency of QC analyses. 72% of the participants used

a single protein digest for quality control. 85% used data

dependent acquisition to collect QC data.

Figure 6. Dispersion values

describe how tightly clustered the

experiments are for each participant.

The metrics from QuaMeter were

normalized based on robust variance

estimation, and deviation for a given file

was defined as the distance of its

metrics vector from the mean vector for

all files from that participant. A low

deviation suggests that most data

produced by a participant conformed

well with others for that participant. The

IDFree and identification-dependent

metrics from QuaMeter, however,

produced different rankings of

participants. The legend is the same as

in Figure 5.

Figure 2. Data flow employed peer-

reviewed tools plus two commercial

packages. Waters exports to mzML

proved to be incompatible with both

QuaMeter and MyriMatch.

The ‘Median’ Laboratory participating in the study: • Uploaded data 8 times during the 9 month test period.

• Injected 200 femtomoles of the 6-protein digest per analysis.

• Used a 2-5 year old mass spectrometer - typically an Orbitrap type instrument.

• Used a commercial column with 75 µm inner diameter, 15 cm length and 3 µm particle size.

• Separated peptides using a 50 min gradient and a flow rate of 300 nL/min.

• Used a trap/pre-column

• Ran quality controls several times per day.

• Employed an operator with 6-10 years of experience.

Data Reduction Using Principal Components Analysis

Figure 5. Raw data from the participants who

uploaded at least 8 LC-MS/MS experiments were

included in robust principal components analysis

(PCA). This technique reduced the 40 QuaMeter

"IDFree" metrics (some of which correlate strongly

with each other) to a set of uncorrelated components.

The first two components (plotted) account for 50% of

the variance among the files. For some participants,

the data were invariant in the first two components but

considerably more so in the other components. The

QuaMeter IDFree metrics do not take derived

identifications into account, allowing their rapid

computation directly from raw data.

Comp.2

Com

p.1

-10

-5

0

5

-10 -5 0 5

@

@

@

@

@ @

@

@

@

@ @

@

@ @

@ @

A

A

A

A A A

A

A A

B

B

C

C

C C

C

C

C

C

C

D

D D

D

D

D

D D

D

E

E

E

E

E E

E E E

F

F

F

F

F

F F F

F

G

G

G

G

G G

G

G

G

G

H

H

H H

H

H H

H

I I I

I

I

I

I I

I

J

J J

J J J

J J

J

K

K

K K K

K

K

K

K

L L

L L

L

L

L L

L

L

L

L

M

M

M

M

N

N

N

N

N

N N

N

N

N O

O

O

O

P

P

P

P

P

P P

P

P

Q

Q

Q

Q

Q

Q

Q

Q R

R R

R

R R

R

R R

S

S S

S

S

S

S S

T

T

T

T

T

T

T

T

T

U U

U

U

U

V

V V

V

V

V

V V

V

W

W W

W W

W W

W

W

X

X

X

Y Y

Y

Z Z

Z Z

Z

Z

Z

Z

Z a

a

a

a a a

a a a

b

b

b

c

c

c

c

c

c

c c

d

d

d

d

d

d

d

d

d

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e e

e

e

f

g g

g g

g g

g

g

h h

h

h

h

h

h

h h

i

i

i

i

i

i

i

127094 150039 156516 202862 244614 249451 285495 305057 324601 340305 351993 353717 360955 364386 374072 378803 500565 503295 514465 529726 531515 542341 554062 623424 624176 628705 668931 670870

685497 686321 696216 698174 712609 719674 725094 757813 767982 773968 774709 776353 781603 782674 784265 784361 817229 826082 837789 870711 873353 874338 904417 914061 924502 948259 962210 971316

@

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

a

b

c

d

e

f

g

h

i

ID free metrics

participants

media

n d

evia

tion

0.7

0.8

0.9

767982870711696216712609542341773968353717249451202862324601624176374072781603244614360955776353784361725094837789531515529726686321962210670870668931340305784265914061364386698174127094774709874338623424514465628705948259500565378803685497904417503295

H

P

DF

I

L

K NG O

CU A

@ M S

EJ

Q

T

B R

ID dependent matrix

participants

media

n d

evia

tion

0.6

0.7

0.8

0.9

249451324601686321374072914061962210712609668931698174948259670870784361542341244614202862774709340305353717870711531515696216767982685497773968364386529726725094503295784265127094904417514465628705360955378803781603624176874338837789500565776353623424

C

S U

F

@E

T

AN

J

PD

H B I

G M

R

L

Q O

K

A B Instrumentation Utilized in the Study

Figure 4. Summary of LC (left chart) and MS (right chart) instrumentation utilized by

survey participants. Dark and light colors indicate the number of participants that began and

completed the QC study

0

50

100

150

200

Number of distinct peptides identified from top twelve bovine proteins

0

5

10

Number of top twelve bovine proteins identified

Acknowledgements.

The PRG2012 would like to acknowledge Michrom Bioresources for

contributing the 6 protein mix and Bioproximity, LLC for contributing data

storage and transfer capabilities. This work was supported by ABRF.

Mar May Jul Sep Nov Jan

Raw Data Collection for 56 Labs

Data upload Time Stamp

Pa

rtic

ipa

nts

Figure 1. Data collection for each participant was evenly

spread across the study period.

Recommended