View
2.759
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Topics covered: Basics of phosphosite identification and localization Evolution of phosphoproteomic literature MS/MS reporting Modification site localization algorithm development 2010 ABRF-iPRG study of phosphopeptide ID and site localization Emerging false localization rate (FLR) metrics
Citation preview
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS
Karl R. ClauserBroad Institute of MIT and Harvard
BioInfoSummer 2012
University of Adelaide
December, 2012
1
Topics Covered
2
• Basics of phospho site identification and localization
• Evolution of phosphoproteomic literature MS/MS reporting
• Modification site localization algorithm development
• 2010 ABRF-iPRG study of phosphopeptide ID and site localization
• Emerging false localization rate (FLR) metrics
3
Localizing a Phosphorylation Site
L/F|P/A/D|T/s/P/S T A\T K
L/F|P/A/D|t S/P/S T A\T K
4
PTM Site LocalizationTest all Locations, Examine Score Gaps
No possible
ambiguity
Single
Site
Multiple
Sites
AVsEEQQPALK
# PO4 sites = # S,T, or Y
AVS(1.0)EEQQPALK
APS(0.99)LT(0.0)DLVKAPsLTDLVK *
APSLtDLVK -
Locations Tested Conclusion
S(0.50)S(0.50)S(0.0)AGPEGPQLDVPRsSSAGPEGPQLDVPR *
SsSAGPEGPQLDVPR *
SSsAGPEGPQLDVPR -
VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGRVTNDIsPEsSPGVGR *
VTNDIsPESsPGVGR *
VTNDISPEssPGVGR -
VtNDIsPESSPGVGR -
VtNDISPEsSPGVGR -
VtNDISPESsPGVGR -
5
PTM Site Localization – Confident Localization
(K)A/P|s|L/T D|L\V K(S)
APS(0.99)LT(0.0)DLVK
6
PTM Site Localization – Ambiguous Localization
(R)S s/S/A/G/P E/G/P Q L|D|V|P R(E)
S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR
7
PTM Site Localization – Ambiguous Localization2 sites: 1 confident, 1 ambiguous
(R)V T N D|I|s/P E|s S/P G V\G R(R)
VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR
Reliability of LC/MS/MS Phosphoproteomic Literature ~2005
8
Citation Approach Instrument #sites #ambiguous Scores Site Supplem.
sites Shown Ambiq Labeled
Shown Spectra
Ballif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no
2004 MCP, 3, digest, SCX
1093-1101 LC/MS/MS
Rush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no
2005, Nat Biotech, 23, pTyr Ab
94-101 LC/MS/MS
Collins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no
2005, J Biol Chem, 280, peptide IMAC
5972-5982 LC/MS/MS
Gruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no
2005 MCP, 4, SCX, IMAC
310-327 LC/MS/MS
“Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it was
tabulated as ambiguous.”
“All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences were
manually assigned…”
“All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establish
their credibility.”
“Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.”
9
• The site(s) of modification Within each peptide sequence, all modifications must be clearly located (unless ambiguous; see
below) and the manner in which this was accomplished (through computation or manual inspection) must be described.
• A justification for any localization score threshold employed.
• Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from
those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is
ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must clearly
labeled as such.
Examples of ambiguities include:
• Modified peptides in which one or more modification sites are ambiguous.
• Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be
assigned.
• Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section
IV).
• Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities
may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric
approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in
negative ion mode for assignment of phosphorylation over sulfonation), or biological or chemical strategies.
• Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or
accompany the manuscript as described in guideline II.
MCP Guideline for publishing PTM data ~2010
III. POST-TRANSLATIONAL MODIFICATIONS
Studies focusing on posttranslational modifications (PTMs) require specialized methodology and documentation to assign the
type(s) and site(s) of the modification(s). The guidelines in this section apply to PTMs that occur under physiological conditions
and to which biological significance may be assigned, such as phosphorylation, glycosylation, etc. as well as purposefully
induced chemical modifications of central importance to the results of the study, such as chemical cross‐linking. These guidelines
do not apply to common modifications arising from sample handling or preparation such as oxidation of Met or alkylation of Cys.
In addition to the tabular presentation(s) of the data described in guideline II, the following information is required:
http://www.mcponline.org/
Supplemental Table Links to Each Labeled Spectrum
10
Spectrum Mill Scoring of MS/MS Interpretations
11
Peak Selection: De-Isotoping, S/N thresholding,
Parent - neutral removal, Charge assignment
Match to Database Candidate Sequences
Score
=
Assignment Bonus
(Ion Type Weighted)
+
Marker Ion Bonus
(Ion Type Weighted)
-
Non-assignment Penalty
(Intensity Weighted)
12.68
92%
SPI (%)
Scored Peak Intensity
12
Spectrum Mill Variable Modification Localization Score
VML score = Difference in Score of same identified sequences with different
variable modification localizations
VML score > 1.1 indicates confident localization
Why a threshold value of 1.1?
1 implies that there is a distinguishing ion of b or y ion type
0.1 means that when unassigned, the peak is 10% the intensity of the base peak
13
*
*
VML Scoring - Room for Improvement
14
S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK
VML score: 1.09
y12
b2
VML Scoring - Room for Improvement
15
VML score: 0.49
S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR
S(0.0)T(0.0)[S(0.5)T(0.5)]P[T(0.5)S(0.5)]PGPR
Phosphosite Localization Scoring - Ascore
16
http://ascore.med.harvard.edu/
Supports Sequest results only, Linux onlyBeausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292.
7
0.07 0.07
Phosphosite Localization Scoring - Andromeda
17
P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ])
= (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ])
PTM score = -10 x log (P)
p: 0.04 - use the 4 most intense fragment ions per 100 m/z units
n: total num possible b/y ions in the observed mass range for all possible combinations of PO4 sites in a peptide
k: number of peaks matching n
Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48.
Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422.
18
True Probability or Just Effective Scores?
Peak selection assumptions
• All regions of spectrum equally likely
• multiply charged fragments below precursor
• some 100-300 m/z values not possible, dipeptide AA combinations
• tolerance in Da, not ppm
• Tall and short peak intensities equally diagnostic
Fragment ion type assumptions
• All ion types equally probable
• Neutral losses ignored, y-H3P04, y-H2O
Phosphosite Localization Scoring - PhosphoRS
19
Taus, T., Kocher, T., Pichler, P., Paschke, C., Schmidt, A., Henrich, C., and Mechtler, K. (2011) J Proteome Res. 10(12): 5354-62.
N: total # of extracted peaks
d: fragment ion mass tolerance
w: full mass range of spectrum
Score all theoretical
fragment ions, not just
site determining ions.
20
Key Aspects of Scoring Localizations
• Select peaks in spectrum to be used for identification/localization
• Test all sequence/location possibilities
• Assign fragment ion types to peaks
• Allow for peaks to have different ion type assignments for conflicting
localization possibilities
• Use score differences to make decision on localization certainty/ambiguity
• Decide upon conservative/aggressive thresholds.
• Provide a clear representation of the certainty/ambiguity in localization of each
site
• Allow for multiple sites with mix of certainty and ambiguity in localization
• Distinguish between:
• Ambiguity – no distinguishing evidence, i.e. either possibility
• Ambiguity – conflicting evidence, multiple co-eluting isoforms present
How can we calculate a false localization rate as a standard measure
of certainty for phosphosite assignment across a dataset?
A BR F
Proteome InformaticsResearch Group
iPRG: Informatic Evaluation of Phosphopeptide Identification and
Phosphosite Localization
ABRF 2010, Sacramento, CA
March 22, 2010
21
A BR F
Proteome InformaticsResearch Group
Study Goals
22
1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories
2. Characterize the underlying reasons why result sets differ
3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource
A BR F
Proteome InformaticsResearch Group
Study Design
23
• Use a common dataset
• Use a common sequence database
• Allow participants to use the bioinformatic tools and methods of their choosing
• Use a common reporting template
• Fix the identification confidence (1% FDR)
• Require an indication of phosphosite ambiguity per spectrum
• Ignore protein inference – for now
A BR F
Proteome InformaticsResearch Group
Study Materials and Instructions to Participants
24
• 1 Orbitrap XL dataset (3 files)
– RAW, mzML, mzXML, MGF, pkl or dta –conversions by ProteoWizard
• 1 FASTA file (SwissProt human seq’s. v57.1)
• 1 template (Excel)
• 1 on-line survey (Survey Monkey)
1. Analyze the dataset
2. Report the phosphopeptidespectrum matches in the provided template
3. Complete an on-line survey
4. Attach a 1-2 page description of your methodology
A BR F
Proteome InformaticsResearch Group
Reporting Template
25
Name of data file
(e.g.,
D20090930_PM_
K562_SCX-
IMAC_fxn03)
Identifiers should be
unique scan numbers
from data file but
may also refer to a
merged range of
MS/MS scans (e.g.,
Scan:19,
2316.19.19.3.dta,
2316.19.19.3.pkl).
Precursor
m/z as
submited
to search
engine
Precursor
charge
reported by
search
engine
Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a
trailing symbol (e.g. SLS#GS#PCPK) OR a string
in parentheses (e.g. SLS(ph)GS(ph)PCPK)
immediately following each phosphorylated
residue. Only phosphorylation of S, T and Y will
be compared; all other modifications (e.g.,
oxidized M) will be ignored. It will be assumed
that all modifications indicated on S, T or Y are
phosphorylations.
Protein identifier(s)
from Fasta file. Use
multiple values if
peptide is found in
multiple proteins,
e.g., Q9NZ18;
Q9UQ35. Protein
inference will not
be scored.
Total number
of
phosphorylati
ons as
evidenced by
the precursor
m/z and MS2
spectrum.
'Y' indicates this match
is BETTER than the
confidence threshold.
'N' indicates the match
is WORSE. Please
report BOTH types of
identifications in your
ranked list. Is this
match above 1% FDR
identification threshold
(Y|N)?
Indicate 'Y' if ALL
phosphorylations
have been
confidently
localized. 'N' if
one or more
have not. Are
ALL
phosphosites
unambiguously
localized (Y|N)?
Peptide
identification
score reported
by search engine
(e.g., E-value, p-
value,
probability,
Mascot score,
etc.)
File Spectrum Identifier
Precursor
m/z
Precursor
Charge Peptide Sequence Accession(s)
Num.
Phospho sites
Peptide Identification
Certainty
Phosphosite
Localization
Certainty
Peptide
Identification
ScoreD20090930_PM_K562_SCX-IMAC_fxn03Scan:908 558.7576 2 qGsPVAAGAPAK Q9NZI8 1 Y Y 0.0002097
D20090930_PM_K562_SCX-IMAC_fxn04Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1 Y N 45.41
D20090930_PM_K562_SCX-IMAC_fxn03Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2 Y N 30.09
D20090930_PM_K562_SCX-IMAC_fxn03Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2 Y N 31.79
D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 SLsGSsPcPK Q9UQ35 2 Y N 0.0112023D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 sLSGSsPcPK Q9UQ35 2 Y N 0.0915611
ABRF iPRG 2010 Study Template: Phosphorylated Peptide Analysis
Instructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to
report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide
Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to
'[email protected]' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey --------------------->
REQUIRED FIELDS
A BR F
Proteome InformaticsResearch Group
26
55%
45%
Membership (n=33)
ABRF Member
Non-member
73%
9%
6%6%
6%
Type of Lab
Academic
Biotech/Pharma/Industry
Contract Research Org
Government
Other
9%6%
15%
70%
LocationAsia
Australia/New Zealand
Europe
North Amercia
39%
15%
43%
3%
Resource Lab Status
Conduct both core functions and non-core lab research
Core only
Non-core research lab
58%9%
12%
18%
3%
Primary Job Function
Bioinformatician/Developer
Director/Manager
Lab Scientist
Mass Spectrometrist
Other
0
5
10
15
20
1-2 years 3-4 years 5-10 years >10 years Unanswered
Proteomics Experience
• 59 requests / 32 submissions (54% return) 2 retractions + 7 iPRG members and 1 guest
A BR F
Proteome InformaticsResearch Group
Software Tools Used
27
Phosphosite Localization
0
1
2
3
4
5
6
Ascore
custom
In-h
ouse
MaxQ
uant
msIn
spect
Myri
Matc
h
NN
Score PLS
Phosphinato
r
PhosphoScore
Prophossi
Spectrum
Mill
Peptide Identification
02468
10121416
Mascot
X!Tandem
OM
SSA
SEQU
EST
Myri
Matc
h
in-h
ouse
PeptidePro
phet
Scaffo ld
InsPecT
PepARM
L
Peptizer
pFind
TPP
iPro
phet
MaxQ
uant
msIn
spect
MSPepSearc
h
OpenM
S/TOPP
Prote
inPro
phet
Pview
SpectraST
Spectrum
Mill
thegpm
A BR F
Proteome InformaticsResearch Group
The SCX/IMAC Enrichment Approach for Phosphoproteomics
28
Sample: 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Protocol: Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638.Lysis: 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitorsSCX: PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn .IMAC: Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7MS/MS: Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1
and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm
A BR F
Proteome InformaticsResearch Group
Preliminary Analysis of SCX Fractions and Dataset Selection
2929
0
500
1000
1500
2000
2500
3000
3500
2 3 4 5 6 7 8 9 10 11 12
SCX fr #
# s
pe
ctr
a
z4
z3
z2
Precursor z
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9 10 11 12
SCX fr#
% d
isti
nct
pep
tid
es
3P
2P
1P
# phosphosites
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9 10 11 12
SCX fr #
% d
isti
nct
pep
tid
es
6SC
5SC
4SC
3SC
2SC
1SC
0SC
-1SC
Solution charge
Frxn 3: multi-phosphositesFrxn 4: single phospho, single basicFrxn 12: multi-basic residues (RHK)
A BR F
Proteome InformaticsResearch Group
From 30,000 Ft.
30
0
1000
2000
3000
4000
5000
6000
7000
8000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# spectra Id Yes# spectra Loc Yes# unique Peptides UC ID Yes
Participant alias
14
94
1
87
13
3
22
73
0
86
01
0
13
80
0
84
94
0v
20
89
9i
53
70
6
92
53
6i
87
04
86
i
45
68
2
87
04
84
i
85
24
6
13
86
7
20
44
1v
40
81
6i
20
10
9
50
30
8i
29
85
0v
56
36
5
66
39
8
91
94
3i
47
58
7
71
26
3
65
21
1
63
10
3
97
21
9i
20
81
4
61
96
3v
18
62
1
74
63
7
15
76
9
77
11
4
66
51
4
77
11
5
Spectral pre-processing Ih Ih
Rr,
Ih Ih Ih Bw Ih Ih Mq Sm Sm Mc
Rr,
Xc Mq
Di,
Mq Bw Ih Em Ih
R,
Xc Ih
precursor m/z adjusted Y Y Y Y Y Y Y Y Y Y Y Y Y
nterm acetyl Y Y Y Y Y Y Y Y Y Y
Peptide identification
My,
Om,
Se,
Xt,
Pp
Om,
Xt,
Pp,
TPP
,
Ip,
Sp Se Pf Pf
Se,
Pp Mp Om
Ma,
Mq Sm
Ma,
My,
Om,
Xt,
Pl Sm
Ma,
My,
Om,
Xt,
Pl
Ma,
Om,
Xt Xt Ma
My,
Xt,
In
Ma,
Ih Ma In Ma Ma Se
Ma,
In,
Op,
Pz Se
Ma,
Sc,
Xt
Xt*
,
Sc Ma
Xt,
Gp
Ma,
Xt,
Sc Pv
Ma,
Ih
Se,
Pp,
Ih
Ma,
Xt,
Sc
Phosphosite localization Ih Ih As Ih
Pf,
As As Ih Ph Mq Sm Ih Sm Ih As Id
As,
Ih Ma In Mq Ps In Ih Ih As Ih Ih
Ih,
Pr As Ih
A BR F
Proteome InformaticsResearch Group
Software Program Abbreviations
31
Software Program Key
Ascore As
Bioworks Bw
Distiller Di
extract_msn Em
TheGPM Gp
in-house Ih
Inspect In
IdPicker Ip
iProphet Id
Mascot Ma
msconvert Mc
msInspect Mi
MyriMatch Mm
MSPepSearch + Spec Lib. Mp
MaxQuant Mq
msInspect Ms
OMSSA Om
OpenMS Op
pFind Pf
Phosphinator Ph
pepARML Pl
PeptideProphet Pp
Peptizer Pz
Prophossi Pr
PhosphoScore Ps
Pview Pv
ReAdW Rr
Scaffold Sc
SEQUEST Se
Spectrum Mill Sm
SpectraST + Spec Lib. Sp
Xcalibur Xc
X!Tandem Xt
X!Tandem (k-score) Xt*
The data analysis tools used by the participants
were collected from the on-line survey as
reported by the participants. Many participants
used multiple search engines and most used a
software tool to localize the phosphosites.
Moreover, many in-house (Ih) or custom
software tools were used in the study, only
some of which are published. The key at the
left can be used to decode the names of the
software tools in the table above, and the table
is sorted (by number of confident peptide
identifications), exactly as in the histogram
above.
A BR F
Proteome InformaticsResearch Group
Relative Performance: Identification By Fraction
32
0
500
1000
1500
2000
2500
3000
3500
400014941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# s
pe
ctr
a I
d Y
es
# spectra Id Yes Frxn 3
# spectra Id Yes Frxn 4
# spectra Id Yes Frxn 12Performance was
not equivalent across the 3
fractions for all participants.
Some participants saw more unique
peptides than others.
0
500
1000
1500
2000
2500
3000
3500
4000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
870484i
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# u
niq
ue
pe
pti
de
s U
C I
d Y
es
# unique peptides UC Id Yes Frxn 3
# unique peptides UC Id Yes Frxn 4
# unique peptides UC Id Yes Frxn 12
A BR F
Proteome InformaticsResearch Group
Room for Improvement in ID Certainty Thresholds
33
0
200
400
600
800
1000
1200
1400
1600
1800
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486
45682
870484
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# s
pe
ctr
a
#DN Diff Id No
#SN Same Id No
#DY Diff Id Yes
#SY Same Id Yes
#Y1P Id Yes single
ii
Frxn 3 – most multiple phos per peptide
0
400
800
1200
1600
2000
2400
2800
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486
45682
870484
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# s
pe
ctr
a
#DN Diff Id No#SN Same Id No
#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single
ii
Frxn 12 – highest precursor charges
0
1000
2000
3000
4000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486
45682
870484
85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
# s
pe
ctr
a
#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single
ii
Frxn 4 – most phosphopeptides
Gray means – Number of spectra where < 2 people agreed on the Id
85246: 1205 spectra with 3-15 phosphosites, 624 spectra with 4-15
20814: ?, Frxn 12 >> Frxn 3,477114, 77115: merged multiple scans, so
can’t be compared with other 33
A BR F
Proteome InformaticsResearch Group
Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4
34
YY: Y – identification Y – localizationYN: Y – identification N – localizationNS: N – identification, but top sequence same as consensusND: N – identification, and top sequence different than consensus
A BR F
Proteome InformaticsResearch Group
Subset of Participants Used for Localization Analysis
35
Excluded0 0% localization1 100% localizationF FDR - very high?R Replicate submissionM Merged spectraC Categorization ErrorsA Y Loc only when
no possible ambiguity
0
1000
2000
3000
4000
5000
6000
7000
8000
14
94
1
87
13
3
22
73
0
86
01
0
13
80
0
84
94
0v
20
89
9i
53
70
6
92
53
6i
87
04
86
i
45
68
2
87
04
84
i
85
24
6
13
86
7
20
44
1v
40
81
6i
20
10
9
50
30
8i
29
85
0v
56
36
5
66
39
8
91
94
3i
47
58
7
71
26
3
65
21
1
63
10
3
97
21
9i
20
81
4
61
96
3v
18
62
1
74
63
7
15
76
9
77
11
4
66
51
4
77
11
5
# s
pectr
a
# spectra Id Yes
# spectra Loc Yes
RF 1 0 1 A0 F 1 CM 0 M
35
22
0
1000
2000
3000
4000
5000
6000
7000
8000
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
13867
20441v
20109
50308i
56365
91943i
47587
71263
97219i
61963v
18621
# s
pectr
a
# spectra Id Yes
# spectra Loc Yes
A BR F
Proteome InformaticsResearch Group
If Participants Agree on the Identity, Do They Also Agree Site Localization Can be Certain?
36
Frxn 4Subset of472 spectrafor which20/22 participantsall agree onIdentity
No possibility of ambiguity
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
NPA10%25%40%55%70%85%100%
% participants indicating localization Yes
% o
f s
pe
ctr
a
A BR F
Proteome InformaticsResearch Group
What Fraction of the Time Do They Agree On Localization(s)?
37
4685, 79%
563, 10%
670, 11% 100% partic agree
67-99% partic agree
< 67% partic agree
5918
Y loc
5918/8050 spectra with > 2/22 Loc Yes
and Site Ambiguity Possible
8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12)
5918
798
498
836
0 1000 2000 3000 4000 5000 6000 7000
# Y loc 2-22 partic
#Y loc 1 partic
# N loc all partic
no ambiguity
# spectra
For all of the participants that agree on identity when• site ambiguity is possible (#S,T,Y > # phos)• >2 participants mark Loc=Y
For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites
A BR F
Proteome InformaticsResearch Group
Which Participants are More Likely to Disagree on Localization?
38
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
13867
20441v
20109
50308i
56365
91943i
47587
71263
97219i
61963v
18621%
o
f s
pe
ctr
a i
n m
ino
rity
lo
cali
za
tio
n c
ho
ice
0.0%
5.0%
10.0%
15.0%
20.0%
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
13867
20441v
20109
50308i
56365
91943i
47587
71263
97219i
61963v
18621%
o
f s
pe
ctr
a i
n m
ino
rity
lo
cali
za
tio
n c
ho
ice
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486i
45682
13867
20441v
20109
50308i
56365
91943i
47587
71263
97219i
61963v
18621%
o
f s
pe
ctr
a i
n m
ino
rity
lo
cali
za
tio
n c
ho
ice
# Spectra with Loc Agreement 50.1-99.9%
Frxn 3: 154
Frxn 4: 498
Frxn 12: 227
x-axis is sorted in descending order of
# identified
A BR F
Proteome InformaticsResearch Group
Liberal Localizers are More Disagreeable
39
The participants who are the most willing
to localize
are more likely to disagree with the
majority view.
x-axis is sorted in descending order of
# localized / # identified
A BR F
Proteome InformaticsResearch Group
A Challenging Problem
40
P(m/z) -H3PO4
879
3/7 DSAIPVESDtDDEGAPR
14/21 said can identify peptide but can not localize site
4/7 DSAIPVEsDtDDEGAPR
A BR F
Proteome InformaticsResearch Group
Primary Observations from iPRG 2010 study
41
1. Wide range of spectra marked confidently identified.
2. Wide range of spectra marked confidently localized.
3. If all of the participants agree on the identification, phosphosite ambiguity is possible, and that localization is possible, for 79% of the spectra, participants unanimously agree on the localization(s).
4. For the remaining 21%, the participants who are liberal localizers are more likely to disagree with the majority view.
A BR F
Proteome InformaticsResearch Group
Acknowledgements
42
iPRG Members
•Paul A. Rudnick (chair) – NIST
•Manor Askenazi - Dana-Farber Cancer Institute
•Karl R. Clauser - Broad Institute of MIT and Harvard
•William S. Lane - Harvard University
•Lennart Martens - Ghent University, Belgium
•Karen Meyer-Arendt - University of Colorado
•W. Hayes McDonald - Vanderbilt University
•Brian C. Searle - Proteome Software, Inc.
•Jeffrey A Kowalak (EB Liaison) – NIMH
Additional Contributors
• Philipp Mertins, The Broad Institute
–All wet lab work and an analysis
• Steve Gygi, Harvard Medical School
–Test datasets
• Matthew Chambers, Vanderbilt University Medical Center
–Data format conversions (ProteoWizard)
• Steve Stein and Yuri Mirokhin, NIST
–A K562 phosphopeptide spectral library
• Renee Robinson, Harvard University
–“The Anonymizer”
Emerging False Localization Rate (FLR) Metrics
43
Target/Decoy for localizationDecoy - AA’s that can not biologically bear the modification
IssuesAllow decoys only during localization, not during identification
otherwise will bias identification FDRAmbiguity – more allowed sites will yield more ambiguous
assignments, so may need to score targets and decoys separately then compare
Frequency - decoy AA occurrence should be similar to target AAsotherwise FLR will be inaccurate
Proximity – a decoy AA nearer the site of a target AA has better chance of matchingPro and Glu often found in the consensus motifs of many kinases
AA Frequency in the Proteome
44
http://proteomics.broadinstitute.org/millhtml/faindexframe.htm
select the Calculate statistics utility
• Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-
CID): 70,000 phosphopeptide spectra identified
• Altered Batch-Tag to allow for phosphorylation of Pro and Glu
• Filtered results to only phosphopeptide IDs containing one S, T or Y
• Modification site known
• Local FLR: SLIP score of 6 = 95% correct
• Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.
Baker, P.R., Trinidad, J.C., and Chalkley, R.J. (2011) Mol Cell
Proteomics. M111.008078.
ProteinProspector SLIP Scoring and Local FLR
Closing Thoughts
46
• More research in the area of FLR metric calculation is critical to the field for
developing standard confidence thresholds for modification site localization.
• An ambiguous modification localization decision for a particular peptide
spectrum match is far preferable to getting it wrong.
• As more raw LC-MS/MS data from PTM studies is deposited in the public
domain, it becomes increasingly possible for knowledgebases to undertake
efforts to reprocess the data with the most recent algorithms and scoring metrics
and enforce uniform quality standards on the information they disseminate.
• PHOSIDA (www.phosida.com) disseminates modification sites identified and
localized in publications emerging only from research in the laboratory of
Matthias Mann. So all MS/MS data has been analyzed through a common
software platform and subject to consistent scoring thresholds.
Review Article
Modification Site Localization Scoring: Strategies and Performance
Chalkley, RJ and Clauser, KR
Mol Cell Proteomics 2012 11: 3-14. doi:10.1074/mcp.R111.015305.
http://www.mcponline.org/
Canonical pathways in lung cancer are being aggressively targeted for drug development
47
Janku et al. J Thoracic Oncol 2011; 6: 1601-1612
EML4-ALK fusion
Crystal, Clinical Advances in Hematology & Oncology, 2011, 9, 207-214.
Targeted therapy development time
49
Gerber and Minna Cancer Cell 2010; 18: 548-551
The future of lung cancer management
50
• Diagnose earlier
• Prognosticate better
• Treat more precisely
• Monitor more effectively
Herbst et al. NEJM 2008