50
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012 University of Adelaide December, 2012 1

Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Embed Size (px)

DESCRIPTION

Topics covered: Basics of phosphosite identification and localization Evolution of phosphoproteomic literature MS/MS reporting Modification site localization algorithm development 2010 ABRF-iPRG study of phosphopeptide ID and site localization Emerging false localization rate (FLR) metrics

Citation preview

Page 1: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS

Karl R. ClauserBroad Institute of MIT and Harvard

BioInfoSummer 2012

University of Adelaide

December, 2012

1

Page 2: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Topics Covered

2

• Basics of phospho site identification and localization

• Evolution of phosphoproteomic literature MS/MS reporting

• Modification site localization algorithm development

• 2010 ABRF-iPRG study of phosphopeptide ID and site localization

• Emerging false localization rate (FLR) metrics

Page 3: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

3

Localizing a Phosphorylation Site

L/F|P/A/D|T/s/P/S T A\T K

L/F|P/A/D|t S/P/S T A\T K

Page 4: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

4

PTM Site LocalizationTest all Locations, Examine Score Gaps

No possible

ambiguity

Single

Site

Multiple

Sites

AVsEEQQPALK

# PO4 sites = # S,T, or Y

AVS(1.0)EEQQPALK

APS(0.99)LT(0.0)DLVKAPsLTDLVK *

APSLtDLVK -

Locations Tested Conclusion

S(0.50)S(0.50)S(0.0)AGPEGPQLDVPRsSSAGPEGPQLDVPR *

SsSAGPEGPQLDVPR *

SSsAGPEGPQLDVPR -

VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGRVTNDIsPEsSPGVGR *

VTNDIsPESsPGVGR *

VTNDISPEssPGVGR -

VtNDIsPESSPGVGR -

VtNDISPEsSPGVGR -

VtNDISPESsPGVGR -

Page 5: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

5

PTM Site Localization – Confident Localization

(K)A/P|s|L/T D|L\V K(S)

APS(0.99)LT(0.0)DLVK

Page 6: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

6

PTM Site Localization – Ambiguous Localization

(R)S s/S/A/G/P E/G/P Q L|D|V|P R(E)

S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR

Page 7: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

7

PTM Site Localization – Ambiguous Localization2 sites: 1 confident, 1 ambiguous

(R)V T N D|I|s/P E|s S/P G V\G R(R)

VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR

Page 8: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Reliability of LC/MS/MS Phosphoproteomic Literature ~2005

8

Citation Approach Instrument #sites #ambiguous Scores Site Supplem.

sites Shown Ambiq Labeled

Shown Spectra

Ballif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no

2004 MCP, 3, digest, SCX

1093-1101 LC/MS/MS

Rush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no

2005, Nat Biotech, 23, pTyr Ab

94-101 LC/MS/MS

Collins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no

2005, J Biol Chem, 280, peptide IMAC

5972-5982 LC/MS/MS

Gruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no

2005 MCP, 4, SCX, IMAC

310-327 LC/MS/MS

“Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it was

tabulated as ambiguous.”

“All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences were

manually assigned…”

“All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establish

their credibility.”

“Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.”

Page 9: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

9

• The site(s) of modification Within each peptide sequence, all modifications must be clearly located (unless ambiguous; see

below) and the manner in which this was accomplished (through computation or manual inspection) must be described.

• A justification for any localization score threshold employed.

• Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from

those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is

ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must clearly

labeled as such.

Examples of ambiguities include:

• Modified peptides in which one or more modification sites are ambiguous.

• Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be

assigned.

• Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section

IV).

• Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities

may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric

approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in

negative ion mode for assignment of phosphorylation over sulfonation), or biological or chemical strategies.

• Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or

accompany the manuscript as described in guideline II.

MCP Guideline for publishing PTM data ~2010

III. POST-TRANSLATIONAL MODIFICATIONS

Studies focusing on posttranslational modifications (PTMs) require specialized methodology and documentation to assign the

type(s) and site(s) of the modification(s). The guidelines in this section apply to PTMs that occur under physiological conditions

and to which biological significance may be assigned, such as phosphorylation, glycosylation, etc. as well as purposefully

induced chemical modifications of central importance to the results of the study, such as chemical cross‐linking. These guidelines

do not apply to common modifications arising from sample handling or preparation such as oxidation of Met or alkylation of Cys.

In addition to the tabular presentation(s) of the data described in guideline II, the following information is required:

http://www.mcponline.org/

Page 10: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Supplemental Table Links to Each Labeled Spectrum

10

Page 11: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Spectrum Mill Scoring of MS/MS Interpretations

11

Peak Selection: De-Isotoping, S/N thresholding,

Parent - neutral removal, Charge assignment

Match to Database Candidate Sequences

Score

=

Assignment Bonus

(Ion Type Weighted)

+

Marker Ion Bonus

(Ion Type Weighted)

-

Non-assignment Penalty

(Intensity Weighted)

12.68

92%

SPI (%)

Scored Peak Intensity

Page 12: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

12

Spectrum Mill Variable Modification Localization Score

VML score = Difference in Score of same identified sequences with different

variable modification localizations

VML score > 1.1 indicates confident localization

Why a threshold value of 1.1?

1 implies that there is a distinguishing ion of b or y ion type

0.1 means that when unassigned, the peak is 10% the intensity of the base peak

Page 13: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

13

*

*

Page 14: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

VML Scoring - Room for Improvement

14

S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK

VML score: 1.09

y12

b2

Page 15: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

VML Scoring - Room for Improvement

15

VML score: 0.49

S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR

S(0.0)T(0.0)[S(0.5)T(0.5)]P[T(0.5)S(0.5)]PGPR

Page 16: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Phosphosite Localization Scoring - Ascore

16

http://ascore.med.harvard.edu/

Supports Sequest results only, Linux onlyBeausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292.

7

0.07 0.07

Page 17: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Phosphosite Localization Scoring - Andromeda

17

P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ])

= (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ])

PTM score = -10 x log (P)

p: 0.04 - use the 4 most intense fragment ions per 100 m/z units

n: total num possible b/y ions in the observed mass range for all possible combinations of PO4 sites in a peptide

k: number of peaks matching n

Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48.

Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422.

Page 18: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

18

True Probability or Just Effective Scores?

Peak selection assumptions

• All regions of spectrum equally likely

• multiply charged fragments below precursor

• some 100-300 m/z values not possible, dipeptide AA combinations

• tolerance in Da, not ppm

• Tall and short peak intensities equally diagnostic

Fragment ion type assumptions

• All ion types equally probable

• Neutral losses ignored, y-H3P04, y-H2O

Page 19: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Phosphosite Localization Scoring - PhosphoRS

19

Taus, T., Kocher, T., Pichler, P., Paschke, C., Schmidt, A., Henrich, C., and Mechtler, K. (2011) J Proteome Res. 10(12): 5354-62.

N: total # of extracted peaks

d: fragment ion mass tolerance

w: full mass range of spectrum

Score all theoretical

fragment ions, not just

site determining ions.

Page 20: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

20

Key Aspects of Scoring Localizations

• Select peaks in spectrum to be used for identification/localization

• Test all sequence/location possibilities

• Assign fragment ion types to peaks

• Allow for peaks to have different ion type assignments for conflicting

localization possibilities

• Use score differences to make decision on localization certainty/ambiguity

• Decide upon conservative/aggressive thresholds.

• Provide a clear representation of the certainty/ambiguity in localization of each

site

• Allow for multiple sites with mix of certainty and ambiguity in localization

• Distinguish between:

• Ambiguity – no distinguishing evidence, i.e. either possibility

• Ambiguity – conflicting evidence, multiple co-eluting isoforms present

How can we calculate a false localization rate as a standard measure

of certainty for phosphosite assignment across a dataset?

Page 21: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

iPRG: Informatic Evaluation of Phosphopeptide Identification and

Phosphosite Localization

ABRF 2010, Sacramento, CA

March 22, 2010

21

Page 22: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Study Goals

22

1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories

2. Characterize the underlying reasons why result sets differ

3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource

Page 23: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Study Design

23

• Use a common dataset

• Use a common sequence database

• Allow participants to use the bioinformatic tools and methods of their choosing

• Use a common reporting template

• Fix the identification confidence (1% FDR)

• Require an indication of phosphosite ambiguity per spectrum

• Ignore protein inference – for now

Page 24: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Study Materials and Instructions to Participants

24

• 1 Orbitrap XL dataset (3 files)

– RAW, mzML, mzXML, MGF, pkl or dta –conversions by ProteoWizard

• 1 FASTA file (SwissProt human seq’s. v57.1)

• 1 template (Excel)

• 1 on-line survey (Survey Monkey)

1. Analyze the dataset

2. Report the phosphopeptidespectrum matches in the provided template

3. Complete an on-line survey

4. Attach a 1-2 page description of your methodology

Page 25: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Reporting Template

25

Name of data file

(e.g.,

D20090930_PM_

K562_SCX-

IMAC_fxn03)

Identifiers should be

unique scan numbers

from data file but

may also refer to a

merged range of

MS/MS scans (e.g.,

Scan:19,

2316.19.19.3.dta,

2316.19.19.3.pkl).

Precursor

m/z as

submited

to search

engine

Precursor

charge

reported by

search

engine

Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a

trailing symbol (e.g. SLS#GS#PCPK) OR a string

in parentheses (e.g. SLS(ph)GS(ph)PCPK)

immediately following each phosphorylated

residue. Only phosphorylation of S, T and Y will

be compared; all other modifications (e.g.,

oxidized M) will be ignored. It will be assumed

that all modifications indicated on S, T or Y are

phosphorylations.

Protein identifier(s)

from Fasta file. Use

multiple values if

peptide is found in

multiple proteins,

e.g., Q9NZ18;

Q9UQ35. Protein

inference will not

be scored.

Total number

of

phosphorylati

ons as

evidenced by

the precursor

m/z and MS2

spectrum.

'Y' indicates this match

is BETTER than the

confidence threshold.

'N' indicates the match

is WORSE. Please

report BOTH types of

identifications in your

ranked list. Is this

match above 1% FDR

identification threshold

(Y|N)?

Indicate 'Y' if ALL

phosphorylations

have been

confidently

localized. 'N' if

one or more

have not. Are

ALL

phosphosites

unambiguously

localized (Y|N)?

Peptide

identification

score reported

by search engine

(e.g., E-value, p-

value,

probability,

Mascot score,

etc.)

File Spectrum Identifier

Precursor

m/z

Precursor

Charge Peptide Sequence Accession(s)

Num.

Phospho sites

Peptide Identification

Certainty

Phosphosite

Localization

Certainty

Peptide

Identification

ScoreD20090930_PM_K562_SCX-IMAC_fxn03Scan:908 558.7576 2 qGsPVAAGAPAK Q9NZI8 1 Y Y 0.0002097

D20090930_PM_K562_SCX-IMAC_fxn04Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1 Y N 45.41

D20090930_PM_K562_SCX-IMAC_fxn03Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2 Y N 30.09

D20090930_PM_K562_SCX-IMAC_fxn03Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2 Y N 31.79

D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 SLsGSsPcPK Q9UQ35 2 Y N 0.0112023D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 sLSGSsPcPK Q9UQ35 2 Y N 0.0915611

ABRF iPRG 2010 Study Template: Phosphorylated Peptide Analysis

Instructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to

report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide

Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to

'[email protected]' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey --------------------->

REQUIRED FIELDS

Page 26: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

26

55%

45%

Membership (n=33)

ABRF Member

Non-member

73%

9%

6%6%

6%

Type of Lab

Academic

Biotech/Pharma/Industry

Contract Research Org

Government

Other

9%6%

15%

70%

LocationAsia

Australia/New Zealand

Europe

North Amercia

39%

15%

43%

3%

Resource Lab Status

Conduct both core functions and non-core lab research

Core only

Non-core research lab

58%9%

12%

18%

3%

Primary Job Function

Bioinformatician/Developer

Director/Manager

Lab Scientist

Mass Spectrometrist

Other

0

5

10

15

20

1-2 years 3-4 years 5-10 years >10 years Unanswered

Proteomics Experience

• 59 requests / 32 submissions (54% return) 2 retractions + 7 iPRG members and 1 guest

Page 27: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Software Tools Used

27

Phosphosite Localization

0

1

2

3

4

5

6

Ascore

custom

In-h

ouse

MaxQ

uant

msIn

spect

Myri

Matc

h

NN

Score PLS

Phosphinato

r

PhosphoScore

Prophossi

Spectrum

Mill

Peptide Identification

02468

10121416

Mascot

X!Tandem

OM

SSA

SEQU

EST

Myri

Matc

h

in-h

ouse

PeptidePro

phet

Scaffo ld

InsPecT

PepARM

L

Peptizer

pFind

TPP

iPro

phet

MaxQ

uant

msIn

spect

MSPepSearc

h

OpenM

S/TOPP

Prote

inPro

phet

Pview

SpectraST

Spectrum

Mill

thegpm

Page 28: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

The SCX/IMAC Enrichment Approach for Phosphoproteomics

28

Sample: 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Protocol: Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638.Lysis: 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitorsSCX: PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn .IMAC: Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7MS/MS: Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1

and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm

Page 29: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Preliminary Analysis of SCX Fractions and Dataset Selection

2929

0

500

1000

1500

2000

2500

3000

3500

2 3 4 5 6 7 8 9 10 11 12

SCX fr #

# s

pe

ctr

a

z4

z3

z2

Precursor z

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12

SCX fr#

% d

isti

nct

pep

tid

es

3P

2P

1P

# phosphosites

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12

SCX fr #

% d

isti

nct

pep

tid

es

6SC

5SC

4SC

3SC

2SC

1SC

0SC

-1SC

Solution charge

Frxn 3: multi-phosphositesFrxn 4: single phospho, single basicFrxn 12: multi-basic residues (RHK)

Page 30: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

From 30,000 Ft.

30

0

1000

2000

3000

4000

5000

6000

7000

8000

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

870484i

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# spectra Id Yes# spectra Loc Yes# unique Peptides UC ID Yes

Participant alias

14

94

1

87

13

3

22

73

0

86

01

0

13

80

0

84

94

0v

20

89

9i

53

70

6

92

53

6i

87

04

86

i

45

68

2

87

04

84

i

85

24

6

13

86

7

20

44

1v

40

81

6i

20

10

9

50

30

8i

29

85

0v

56

36

5

66

39

8

91

94

3i

47

58

7

71

26

3

65

21

1

63

10

3

97

21

9i

20

81

4

61

96

3v

18

62

1

74

63

7

15

76

9

77

11

4

66

51

4

77

11

5

Spectral pre-processing Ih Ih

Rr,

Ih Ih Ih Bw Ih Ih Mq Sm Sm Mc

Rr,

Xc Mq

Di,

Mq Bw Ih Em Ih

R,

Xc Ih

precursor m/z adjusted Y Y Y Y Y Y Y Y Y Y Y Y Y

nterm acetyl Y Y Y Y Y Y Y Y Y Y

Peptide identification

My,

Om,

Se,

Xt,

Pp

Om,

Xt,

Pp,

TPP

,

Ip,

Sp Se Pf Pf

Se,

Pp Mp Om

Ma,

Mq Sm

Ma,

My,

Om,

Xt,

Pl Sm

Ma,

My,

Om,

Xt,

Pl

Ma,

Om,

Xt Xt Ma

My,

Xt,

In

Ma,

Ih Ma In Ma Ma Se

Ma,

In,

Op,

Pz Se

Ma,

Sc,

Xt

Xt*

,

Sc Ma

Xt,

Gp

Ma,

Xt,

Sc Pv

Ma,

Ih

Se,

Pp,

Ih

Ma,

Xt,

Sc

Phosphosite localization Ih Ih As Ih

Pf,

As As Ih Ph Mq Sm Ih Sm Ih As Id

As,

Ih Ma In Mq Ps In Ih Ih As Ih Ih

Ih,

Pr As Ih

Page 31: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Software Program Abbreviations

31

Software Program Key

Ascore As

Bioworks Bw

Distiller Di

extract_msn Em

TheGPM Gp

in-house Ih

Inspect In

IdPicker Ip

iProphet Id

Mascot Ma

msconvert Mc

msInspect Mi

MyriMatch Mm

MSPepSearch + Spec Lib. Mp

MaxQuant Mq

msInspect Ms

OMSSA Om

OpenMS Op

pFind Pf

Phosphinator Ph

pepARML Pl

PeptideProphet Pp

Peptizer Pz

Prophossi Pr

PhosphoScore Ps

Pview Pv

ReAdW Rr

Scaffold Sc

SEQUEST Se

Spectrum Mill Sm

SpectraST + Spec Lib. Sp

Xcalibur Xc

X!Tandem Xt

X!Tandem (k-score) Xt*

The data analysis tools used by the participants

were collected from the on-line survey as

reported by the participants. Many participants

used multiple search engines and most used a

software tool to localize the phosphosites.

Moreover, many in-house (Ih) or custom

software tools were used in the study, only

some of which are published. The key at the

left can be used to decode the names of the

software tools in the table above, and the table

is sorted (by number of confident peptide

identifications), exactly as in the histogram

above.

Page 32: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Relative Performance: Identification By Fraction

32

0

500

1000

1500

2000

2500

3000

3500

400014941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

870484i

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# s

pe

ctr

a I

d Y

es

# spectra Id Yes Frxn 3

# spectra Id Yes Frxn 4

# spectra Id Yes Frxn 12Performance was

not equivalent across the 3

fractions for all participants.

Some participants saw more unique

peptides than others.

0

500

1000

1500

2000

2500

3000

3500

4000

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

870484i

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# u

niq

ue

pe

pti

de

s U

C I

d Y

es

# unique peptides UC Id Yes Frxn 3

# unique peptides UC Id Yes Frxn 4

# unique peptides UC Id Yes Frxn 12

Page 33: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Room for Improvement in ID Certainty Thresholds

33

0

200

400

600

800

1000

1200

1400

1600

1800

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486

45682

870484

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# s

pe

ctr

a

#DN Diff Id No

#SN Same Id No

#DY Diff Id Yes

#SY Same Id Yes

#Y1P Id Yes single

ii

Frxn 3 – most multiple phos per peptide

0

400

800

1200

1600

2000

2400

2800

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486

45682

870484

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# s

pe

ctr

a

#DN Diff Id No#SN Same Id No

#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single

ii

Frxn 12 – highest precursor charges

0

1000

2000

3000

4000

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486

45682

870484

85246

13867

20441v

40816i

20109

50308i

29850v

56365

66398

91943i

47587

71263

65211

63103

97219i

20814

61963v

18621

74637

15769

77114

66514

77115

# s

pe

ctr

a

#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single

ii

Frxn 4 – most phosphopeptides

Gray means – Number of spectra where < 2 people agreed on the Id

85246: 1205 spectra with 3-15 phosphosites, 624 spectra with 4-15

20814: ?, Frxn 12 >> Frxn 3,477114, 77115: merged multiple scans, so

can’t be compared with other 33

Page 34: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4

34

YY: Y – identification Y – localizationYN: Y – identification N – localizationNS: N – identification, but top sequence same as consensusND: N – identification, and top sequence different than consensus

Page 35: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Subset of Participants Used for Localization Analysis

35

Excluded0 0% localization1 100% localizationF FDR - very high?R Replicate submissionM Merged spectraC Categorization ErrorsA Y Loc only when

no possible ambiguity

0

1000

2000

3000

4000

5000

6000

7000

8000

14

94

1

87

13

3

22

73

0

86

01

0

13

80

0

84

94

0v

20

89

9i

53

70

6

92

53

6i

87

04

86

i

45

68

2

87

04

84

i

85

24

6

13

86

7

20

44

1v

40

81

6i

20

10

9

50

30

8i

29

85

0v

56

36

5

66

39

8

91

94

3i

47

58

7

71

26

3

65

21

1

63

10

3

97

21

9i

20

81

4

61

96

3v

18

62

1

74

63

7

15

76

9

77

11

4

66

51

4

77

11

5

# s

pectr

a

# spectra Id Yes

# spectra Loc Yes

RF 1 0 1 A0 F 1 CM 0 M

35

22

0

1000

2000

3000

4000

5000

6000

7000

8000

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

13867

20441v

20109

50308i

56365

91943i

47587

71263

97219i

61963v

18621

# s

pectr

a

# spectra Id Yes

# spectra Loc Yes

Page 36: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

If Participants Agree on the Identity, Do They Also Agree Site Localization Can be Certain?

36

Frxn 4Subset of472 spectrafor which20/22 participantsall agree onIdentity

No possibility of ambiguity

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

NPA10%25%40%55%70%85%100%

% participants indicating localization Yes

% o

f s

pe

ctr

a

Page 37: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

What Fraction of the Time Do They Agree On Localization(s)?

37

4685, 79%

563, 10%

670, 11% 100% partic agree

67-99% partic agree

< 67% partic agree

5918

Y loc

5918/8050 spectra with > 2/22 Loc Yes

and Site Ambiguity Possible

8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12)

5918

798

498

836

0 1000 2000 3000 4000 5000 6000 7000

# Y loc 2-22 partic

#Y loc 1 partic

# N loc all partic

no ambiguity

# spectra

For all of the participants that agree on identity when• site ambiguity is possible (#S,T,Y > # phos)• >2 participants mark Loc=Y

For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites

Page 38: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Which Participants are More Likely to Disagree on Localization?

38

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

13867

20441v

20109

50308i

56365

91943i

47587

71263

97219i

61963v

18621%

o

f s

pe

ctr

a i

n m

ino

rity

lo

cali

za

tio

n c

ho

ice

0.0%

5.0%

10.0%

15.0%

20.0%

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

13867

20441v

20109

50308i

56365

91943i

47587

71263

97219i

61963v

18621%

o

f s

pe

ctr

a i

n m

ino

rity

lo

cali

za

tio

n c

ho

ice

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

14941

87133

22730

86010

13800

84940v

20899i

53706

92536i

870486i

45682

13867

20441v

20109

50308i

56365

91943i

47587

71263

97219i

61963v

18621%

o

f s

pe

ctr

a i

n m

ino

rity

lo

cali

za

tio

n c

ho

ice

# Spectra with Loc Agreement 50.1-99.9%

Frxn 3: 154

Frxn 4: 498

Frxn 12: 227

x-axis is sorted in descending order of

# identified

Page 39: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Liberal Localizers are More Disagreeable

39

The participants who are the most willing

to localize

are more likely to disagree with the

majority view.

x-axis is sorted in descending order of

# localized / # identified

Page 40: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

A Challenging Problem

40

P(m/z) -H3PO4

879

3/7 DSAIPVESDtDDEGAPR

14/21 said can identify peptide but can not localize site

4/7 DSAIPVEsDtDDEGAPR

Page 41: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Primary Observations from iPRG 2010 study

41

1. Wide range of spectra marked confidently identified.

2. Wide range of spectra marked confidently localized.

3. If all of the participants agree on the identification, phosphosite ambiguity is possible, and that localization is possible, for 79% of the spectra, participants unanimously agree on the localization(s).

4. For the remaining 21%, the participants who are liberal localizers are more likely to disagree with the majority view.

Page 42: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

A BR F

Proteome InformaticsResearch Group

Acknowledgements

42

iPRG Members

•Paul A. Rudnick (chair) – NIST

•Manor Askenazi - Dana-Farber Cancer Institute

•Karl R. Clauser - Broad Institute of MIT and Harvard

•William S. Lane - Harvard University

•Lennart Martens - Ghent University, Belgium

•Karen Meyer-Arendt - University of Colorado

•W. Hayes McDonald - Vanderbilt University

•Brian C. Searle - Proteome Software, Inc.

•Jeffrey A Kowalak (EB Liaison) – NIMH

Additional Contributors

• Philipp Mertins, The Broad Institute

–All wet lab work and an analysis

• Steve Gygi, Harvard Medical School

–Test datasets

• Matthew Chambers, Vanderbilt University Medical Center

–Data format conversions (ProteoWizard)

• Steve Stein and Yuri Mirokhin, NIST

–A K562 phosphopeptide spectral library

• Renee Robinson, Harvard University

–“The Anonymizer”

Page 43: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Emerging False Localization Rate (FLR) Metrics

43

Target/Decoy for localizationDecoy - AA’s that can not biologically bear the modification

IssuesAllow decoys only during localization, not during identification

otherwise will bias identification FDRAmbiguity – more allowed sites will yield more ambiguous

assignments, so may need to score targets and decoys separately then compare

Frequency - decoy AA occurrence should be similar to target AAsotherwise FLR will be inaccurate

Proximity – a decoy AA nearer the site of a target AA has better chance of matchingPro and Glu often found in the consensus motifs of many kinases

Page 44: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

AA Frequency in the Proteome

44

http://proteomics.broadinstitute.org/millhtml/faindexframe.htm

select the Calculate statistics utility

Page 45: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

• Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-

CID): 70,000 phosphopeptide spectra identified

• Altered Batch-Tag to allow for phosphorylation of Pro and Glu

• Filtered results to only phosphopeptide IDs containing one S, T or Y

• Modification site known

• Local FLR: SLIP score of 6 = 95% correct

• Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.

Baker, P.R., Trinidad, J.C., and Chalkley, R.J. (2011) Mol Cell

Proteomics. M111.008078.

ProteinProspector SLIP Scoring and Local FLR

Page 46: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Closing Thoughts

46

• More research in the area of FLR metric calculation is critical to the field for

developing standard confidence thresholds for modification site localization.

• An ambiguous modification localization decision for a particular peptide

spectrum match is far preferable to getting it wrong.

• As more raw LC-MS/MS data from PTM studies is deposited in the public

domain, it becomes increasingly possible for knowledgebases to undertake

efforts to reprocess the data with the most recent algorithms and scoring metrics

and enforce uniform quality standards on the information they disseminate.

• PHOSIDA (www.phosida.com) disseminates modification sites identified and

localized in publications emerging only from research in the laboratory of

Matthias Mann. So all MS/MS data has been analyzed through a common

software platform and subject to consistent scoring thresholds.

Review Article

Modification Site Localization Scoring: Strategies and Performance

Chalkley, RJ and Clauser, KR

Mol Cell Proteomics 2012 11: 3-14. doi:10.1074/mcp.R111.015305.

http://www.mcponline.org/

Page 47: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Canonical pathways in lung cancer are being aggressively targeted for drug development

47

Janku et al. J Thoracic Oncol 2011; 6: 1601-1612

Page 48: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

EML4-ALK fusion

Crystal, Clinical Advances in Hematology & Oncology, 2011, 9, 207-214.

Page 49: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

Targeted therapy development time

49

Gerber and Minna Cancer Cell 2010; 18: 548-551

Page 50: Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

The future of lung cancer management

50

• Diagnose earlier

• Prognosticate better

• Treat more precisely

• Monitor more effectively

Herbst et al. NEJM 2008