Upload
bing-yang
View
218
Download
2
Embed Size (px)
Citation preview
RAPID COMMUNICATIONS IN MASS SPECTROMETRY
Rapid Commun. Mass Spectrom. 2005; 19: 2983–2985
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/rcm.2137
To the Editor-in Chief
Sir,
Using cross-correlation normalized
forpeptide length to optimizepeptide
identification in shotgun proteomics
Over the past few years, the technology
of multidimensional peptide separa-
tion coupled with tandem mass spec-
trometric (MS/MS) identification has
seen remarkable development and
has been extensively used in high-
throughput protein identification for
proteomics research. The SEQUEST
database search engine1 has been nor-
mally used in this strategy for more
than 10 years; however, it is well
known that there is considerable over-
lapping between positive and negative
peptide identifications. Removal of
false positive results and reduction of
false negative identifications is a key
problem in proteomics research.
Manual validation was first used to
validate peptide identifications in shot-
gun proteomics,2 but this is time-con-
suming and not feasible for high-
throughput analysis of large datasets,
and also depends on the experience of
‘experts’; this reduces the reproducibil-
ity and comparability of data among
laboratories. Several statistical tools
have been used to evaluate results of
SEQUEST.3–7 These statistical models
normally were trained or tested using
datasets for mixtures of known pro-
teins, so their reliability still needs to be
investigated in analysis of real samples.
Random database searching metho-
dology is a simple approach that can
provide effective criteria to minimize
the false positive rates (FPRs), and also
evaluate the effect of different search
parameters on peptide identifications.
In this approach FPRs were calculated
by searching against a conventional
protein database and a random protein
database (sequence-reversed) or a non-
homology database different from the
conventional normal database; then
FPRs can be controlled in a reasonable
range by adjusting the search para-
meter.
Moore et al.7 first used a sequence-
reversed protein database to estimate
random assignment between an MS/
MS spectrum and a peptide in the
database. Peng et al.8 used a reversed
yeast protein database to analyze FPRs
for a yeast proteome research, and
decreased FPRs to less than 1% and
reduced the need for manual interpre-
tation while identifying more proteins.
Qian et al.9 utilized the reversed data-
base strategy to evaluate the FPRs for
peptide identifications from three
human proteome samples, and sug-
gested that FPRs are significantly
dependent on sample characteristics.
Although the random database
search strategy can cut down the FPRs
of peptide identifications, this strategy
still needs in-depth research. Setting
the same cutoff for shorter and longer
peptides is obviously inappropriate
because the Xcorr value is dependent
on assigned peptide size.3 Yu et al.10
calculated the FPRs of peptide identi-
fications by searching an Archaean
protein database; they set an Xcorr
cutoff of 2.2 for doubly charged pep-
tides with molecular mass <1200 Da,
and an Xcorr cutoff of 2.5 for doubly
charged peptides with molecular mass
�1200 Da. This approach did improve
the peptide identification, but we con-
sidered it to be insufficient because
Xcorr and molecular mass are contin-
uous variables.
In this work Xcorr was normalized in
a fashion designed to reduce the pep-
tide size dependence of Xcorr, and we
investigated the effect of this normal-
ization on the rank of peptides. An
experimental dataset was obtained
from the work of Keller et al.11 All
tandem mass spectra were generated
from 22 liquid chromatography/tan-
dem mass spectrometry (LC/MS/MS)
runs on two mixtures of 18 purified
proteins at a variety of concentrations.
Two mixtures were digested by trypsin
and analyzed by electrospray ioniza-
tion ion trap mass spectrometry (ESI-
ITMS) (ThermoFinnigan, San Jose, CA,
USA).
The protein database ipi.Human.3.05
was downloaded from the European
Bioinformatics Institute (EBI);12 it
contained 49 161 protein entries, but
in this work the sequence of each
entry was reversed using an in-house
program (reverse.pl). The sequences of
18 control proteins that were used in
this study were a little different from
those in the experiment of Keller et al.;11
we replaced Q04977 by P06278 for B.
lichenformis a-amylase, and did not
choose the rabbit myosin heavy and
light chains. A new protein database
was constructed by appending
sequences of the 18 known proteins
to the sequence-reversed human IPI
database.
Tandem mass spectra were analyzed
against this protein database using
SEQUEST v2.7 (ThermoFinnigan).
Peptide mass tolerance was set as
1.5 Da, cysteine carbamidomethylation
and methionine oxidation were con-
sidered, and the enzymatic constraint
was trypsin with a maximum of two
internal missed cleavage sites. In total,
this analysis produced a dataset of
24 489 tandem mass spectra with
24 486 peptide identifications.
A peptide that passed the strict Xcorr
filter and belonged to the 18 known
proteins was counted as a positive
peptide, but otherwise was counted as
a negative peptide. Common contami-
nants were not considered, as the
human proteins of the database were
sequence-reversed.
In the random database searching
strategy, researchers normally have
set criteria to ensure that the con-
fidence of peptide identifications
was more then 95%, e.g., Xcorr �1.9
for singly charged peptides, Xcorr �2.2
for doubly charged peptides, Xcorr
�3.75 for triply charged peptides, and
DeltaCn �0.1 for all peptides., How-
ever, the Xcorr value for a peptide
depended on peptide size; if a single
cutoff of Xcorr is chosen for all peptides
with the same charge state but of
different lengths, it will be a strict
criterion for shorter peptides but loose
for longer peptides. Figure 1 shows
results for doubly charged peptides
with Xcorr >1.5 for peptides contain-
ing 8, 13, 18, or 22 amino acids; the total
number of peptides considered was
361. Each bar in the histogram repre-
sents the number of peptides within an
Xcorr range of 0.3 for a specified
Copyright # 2005 John Wiley & Sons, Ltd.
RCM
Letter to the Editor
peptide length. Clearly, the Xcorr
values of short peptides are small and
distributed in a narrow range, but
those of long peptides are large and
distributed in a wide range. Thus,
normalizing Xcorr to make it indepen-
dent of the peptide length is highly
desirable.
Keller et al.3 used Eqn. (1) to reduce
the length dependence of Xcorr; here
we investigated another approach
(Eqn. (2)) for the same purpose; in
Eqn. (2), Ni is the number of possible
fragment ions for each peptide. An in-
house program, normalize.pl, was
written to apply Eqn. (2) to normalize
Xcorr and recalculate DeltaCn values
of peptides with Xcorr >1.0; the other
peptides were discarded. This resulted
in normalization of Xcorr of 21 437
peptides and recalculation of their
DeltaCn values. Only the first ranked
peptide hit was accepted as the correct
peptide for protein identifications, but
some peptides were no longer ranked
first after their Xcorr values were
normalized. These peptides were
ranked again according to Xcorr’ and
their DeltaCn values were recalculated
using the Xcorr’ of the new first and
second ranked peptides.
Xcorr0 ¼ lnðXcorrÞlnðNLÞ
ð1Þ
Xcorr0 ¼ lnðXcorrÞlnðNiÞ
ð2Þ
After the Xcorr values of the peptides
had been normalized their length
dependence was reduced, as shown
in Fig. 2. Doubly charged peptides with
Xcorr’ >0.12, for lengths of 8, 13, 18, 22
amino acids, were selected as the
example in this histogram (total num-
ber of peptides was 316). Xcorr’ values
for short and long peptides are now
better distributed and a distinct
improvement of Xcorr’ for long pep-
tides is found.
All peptide identifications were
extracted from SEQUEST search
results files by an in-house program,
extract.pl. Under the same condition of
FPR �5%, the number of true positive
peptides (NTPP) identified was
improved for doubly and triply
charged peptides by using the Xcorr’
threshold, but the NTPP for singly
charged peptides decreased compared
with that using the Xcorr threshold, as
shown in Table 1. The NTPP values for
doubly and triply charged peptides
increased by 16.7% and 5.2%, respec-
tively; as the lengths of the doubly and
triply charged peptides span a wide
range, the peptide identifications were
obviously optimized. Since some false
positive peptides with low DeltaCn
values disturbed the peptide identifi-
cations, the NTPP for singly charged
peptides decreased when using Xcorr’.
The total NTPP in different charge
states increased by 8.5% after Xcorr
was normalized.
Figure 1. Histogram showing effect of peptide length on Xcorr.
Figure 2. Histogram showing reduction of peptide length dependence of Xcorr by
application of normalized Xcorr.
Copyright # 2005 John Wiley & Sons, Ltd. Rapid Commun. Mass Spectrom. 2005; 19: 2983–2985
2984 Letter to the Editor
Considering the DeltaCn threshold,
a distinct improvement in NTPP was
obtained for all peptides as shown in
Table 2. (Note that we considered the
threshold of DeltaCn in Table 2 but not
in Table 1, so non-normalized Xcorr
scores in Table 2 are less than the scores
in Table 1.) NTPP increased by 13.5%,
17.5% and 13.6% for singly, doubly and
triply charged peptides, respectively,
under the condition FPRs% �5%.
Application of the Xcorr’ and DeltaCn
thresholds as criteria resulted in an
increase of 15.8% for NTPP of all
peptides (all charge states), resulting
in 2068 true positive peptides; this
should be compared with the results
of application of the Xcorr and DeltaCn
thresholds as criteria, that resulted in
identification of 1786 true positive
peptides. A total of 1764 peptides were
found in both datasets produced by the
two different criteria. The increase of
282 in the NTPP values resulted from
removal of 22 peptides from the list of
true positive peptides obtained using
the Xcorr criterion, and reassignment
of 304 false negative peptides to the
true positive peptides list by using
Xcorr’ and DeltaCn as the criteria.
Of the 21 437 peptides with Xcorr
>1.0, 549 were no longer ranked first
after normalization of Xcorr; in these
cases, the peptide with Xcorr’ ranked
first was selected. None of the 2068 true
positive peptides was included in the
549 peptides whose rank was changed,
so normalization of Xcorr appears to
have little effect on the rank of true
positive peptides.
In this report, we applied a simple
formula which is a little different from
that used previously3 to normalize the
Xcorr and reduce the peptide size
dependence of Xcorr. Previously,3 the
normalized Xcorr was used in a statis-
tical model; herein, we used the nor-
malized Xcorr to overcome some
shortcomings of random database
searching methodology. The peptide
identifications were remarkably
improved with a decrease of the false
negative rate by using normalized
Xcorr and DeltaCn with random data-
base strategy. The false positive rate
will decrease also if more strict criteria
were used to keep the peptide identi-
fications fixed.
Peptide identifications can be opti-
mized by using the normalized Xcorr.
Shotgun proteomics assembles peptide
identifications into protein identifica-
tions, so the protein identification can
be optimized when peptides are iden-
tified with more confidence.
AcknowledgementsWe would like to thank Dr. Andrew Kellerfor providing the test dataset. We thankSongfeng Wu and Jiyang Zhang for valuablecomments and discussions. We acknowl-edge the financial support for the work byChina Technology R&D Project (No.2002BA711A11, 2004BA711A18); National
Key Program for Basic Research (No.2001CB510201, 2004CB520802); and BeijingMunicipal Program for Science & Technol-ogy (H03023028190).
Bing Yang1,2{, Wantao Ying1{,Yan Gong1, Yangjun Zhang1, Yun Cai1,
Hongye Dong2 and Xiaohong Qian1*1Beijing Institute of Radiation
Medicine, 27 Taiping Road, Beijing100850, China
2Shengyang Pharmaceutical Univer-sity, Shengyang 110016, China
*Correspondence to: X. Qian, BeijingInstitute of Radiation Medicine, 27Taiping Road, Beijing 100850, China.E-mail: [email protected]{These authors contributed equally tothis work.Contract/grant sponsor: China TechnologyR&D Project; Contract/grant number:2002BA711A11, 2004BA711A18.Contract/grant sponsor: National KeyProgram for Basic Research; Contract/grant number: 2001CB510201,2004CB520802.Contract/grant sponsor: Beijing Munici-pal Program for Science & Technology;Contract/grant number: H03023028190.
REFERENCES
1. Eng JK, McCormack AL, Yates JR III.J. Am. Soc. Mass Spectrom. 1994; 5:976.
2. Link AJ, Eng J, Schieltz DM,Carmack E, Mize GJ, MorrisDR, Garvik BM, Yates JR III. Nat.Biotechnol. 1999; 17: 676.
3. Keller A, Nesvizhskii AI, Kolker E,Aebersold R. Anal. Chem. 2002; 74:5383.
4. Anderson DC, Li WQ, Payan DG,Noble WF. J. Proteome Res 2003; 2:137.
5. Fenyo D, Beavis RC. Anal. Chem.2003; 75: 768.
6. Sadygov RG, Liu HB, Yates JR III.Anal. Chem. 2004; 76: 1664.
7. Moore RE, Young MK, Lee TD. J.Am. Soc.Mass Spectrom. 2002; 13: 378.
8. Peng J, Elias JE, Thoreen CC,Licklider LJ, Gygi SP. J. ProteomeRes. 2003; 2: 43.
9. Qian WJ, Liu T, Monroe ME,Strittmatter EF, Jacobs JM, KangasLJ, Petritis K, Camp DG II, SmithRD. J. Proteome Res. 2005; 4: 53.
10. Yu LR, Conrads TP, Uo T, KinoshitaY, Morrison RS, Lucas DA, ChanKC, Blonder J, Issaq HJ, VeenstraTD. Mol. Cell. Proteomics 2004; 3:896.
11. Keller A, Purvine S, NesvizhskiiAI, Stolyar S, Goodlett DR, KolkerE. Omics 2002; 6: 207.
12. Available: www.ebi.ac.uk/proteome/index.html.
Received 5 August 2005Revised 12 August 2005
Accepted 12 August 2005
Table 1. Estimate of the effect of Xcorr and Xcorr’ threshold on the peptide
identifications with a control of FPR �5% (without considering DeltaCn)
Normalized Non-normalized
Xcorr’ NTPP FPR% Xcorr NTPP FPR%
Charge state 1þ 0.27 19 5 1.8 66 4.3Charge state 2þ 0.27 1075 3.7 2.4 921 4.8Charge state 3þ 0.25 711 3.8 2.9 676 3.8
Table 2. Estimate of the effect of Xcorr and Xcorr’ threshold on the peptide
identifications with a control of FPR �5% (DeltaCn> 0.1)
Normalized Non-normalized
Xcorr’ NTPP FPR% Xcorr NTPP FPR%
Charge state 1þ 0.21 84 1.2 1.7 74 4.0Charge state 2þ 0.25 1166 4.8 2.3 992 4.2Charge state 3þ 0.23 818 3.9 2.8 720 3.5
Letter to the Editor 2985
Copyright # 2005 John Wiley & Sons, Ltd. Rapid Commun. Mass Spectrom. 2005; 19: 2983–2985