92
Paulo Costa Carvalho Laboratory for Proteomics and Protein Engine Fiocruz - PR nalyzing shotgun proteomic data pcarvalho.com

2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Embed Size (px)

Citation preview

Page 1: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Paulo Costa CarvalhoLaboratory for Proteomics and Protein EngineeringFiocruz - PR

Analyzing shotgun proteomic data

pcarvalho.com

Page 2: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

2

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 3: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

3

J. Proteome Res., 2011, 10 (1), pp 153–160DOI: 10.1021/pr100677g

Motivations

Page 4: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

4

Page 5: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

5

Editorial

“There has been an unprecedented improvement in the quality and quantity of commercial proteomics data generation technologies, making data generation more accessible to many researchers. However, more and more discoveries will be led by researchers in command of the skills necessary to mine and extensively interpret the volumes of data. Already the ability to generate data vastly outpaces our ability to interpret it, and the lack of expertise in interpreting data is the current gating factor in the advancement of proteomics sciences. Proteomics scientists with training solely in data generation techniques will be shut out of more and more research opportunities.

Nuno Bandeira, July 2011

Computational Proteomics

Page 6: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Too many roads not taken

Eduards AM, Nature, Feb 2011

Page 7: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

7

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 8: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Proteomics has revolutionized biochemical research

Page 9: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

pcarvalho.com 9

Page 10: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

10

LC / MS shotgun proteomic data

Mass / Charge

Time

Page 11: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

AF Y L K

m/z

A F Y AL KNH2 COOH

(precursor)2+

(B) (Y)

Page 12: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

A FY L K

A

m/z

A F Y L KNH2 COOH

AF Y L K

(precursor)2+

(B) (Y)

Page 13: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

A F Y

L K

L

m/z

A F Y KNH2 COOH

AF Y L K

A F

Y L K(precursor)2+

(B) (Y)

Page 14: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

m/z

A F Y L KNH2 COOH

K

A F Y L

AF Y L K

A F

Y L K

A F Y

L K

(precursor)2+

(B) (Y)

Page 15: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

15

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 16: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Strategies for protein identificationby mass spectrometry

• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to

specify PTMs a priori.• De novo sequencing

• Advantage: does not require a database • Disadvantage: most error prone.

• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to

small changes in the sequence• Disadvantages: not as sensitive as PSM when the protein

is in the DB

Page 17: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

17

• De novo sequencing• Advantage: does not require a database • Disadvantage: most error prone

M/Z

MS/MS

Inte

ns

ity

QG

D

F V L ET

S K

HA

GI

I

LV

L

G

T

SV

G

V

V

K

E

DA

S

PE

Page 18: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

18

• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to small sequence changes• Disadvantages: not as sensitive as PSM when the protein is in the DB

Na S et al., MCP, 2008

Page 19: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

19

• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to specify

PTMs a priori

Page 20: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

20

Protein Identification using a database

ProLuCIDXtandemOMSSA

AndromedaSEQUESTMascot

Page 21: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Interpreting MS/MS Proteomics Results

Brian C. SearleProteome Software Inc. Portland, Oregon USA

[email protected]

NPC Progress Meeting(February 2nd, 2006)

Illustrated by Toni Boudreault

Page 22: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

M/z

Inte

nsity

R I T P E AH2O

B-type, A-type, Y-type IonsAll these peaks are seen together

simultaneouslyand we don’t

even know…

Page 23: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

M/z

Inte

nsity

What type of ion they are, making the mass differences approach even more difficult.

Finally, as with all analytical techniques,

Page 24: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

M/z

Inte

nsity

There’s noise,producing a final spectrum that looks like…

Page 25: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

M/z

Inte

nsity

….This, on a good day. And so it’s actually fairly difficult to…

Page 26: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

26

XCalibur :: Show experimental data

Page 27: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Known Ion Types

B-type ionsA-type ionsY-type ions

We knew a couple of things about peptide fragmentation.

Not only do we know to expect B, A, and Y ions,

but…

Page 28: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Known Ion Types

B-type ionsA-type ionsY-type ions

B- or Y-type +2H ionsB- or Y-type -NH3 ions

B- or Y-type -H2O ions

• 100%• 20%• 100%

• 50%• 20%• 20%

… likelihood of seeing each type of ion,

where generally B and Y ions are most prominent.

Page 29: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

If we know the amino acid

sequence of a peptide,

we can guess what the spectra should

look like!

So it’s actually pretty easy to guess what a spectrum

should look like

if we know what the peptide sequence is.

Page 30: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

ELVISLIVESK

Model Spectrum

*Courtesy of Dr. Richard Johnsonhttp://www.hairyfatguy.com/

So as an example, consider the peptide

ELVIS LIVES K

that was synthesized by Rich Johnson in

Seattle

Page 31: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Model Spectrum

We can create a hypothetical spectrum based on our rules

Page 32: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

B/Y type ions (100%)

A type ionsB/Y -NH3/-H2O

(20%)

B/Y +2H type ions(50%)

Where B and Y ions are estimated at 100%,

plus 2 ions are estimated at

50%, and other stragglers are at 20%.

Page 33: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Model Spectrum

So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…

Page 34: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Model Spectrum

We can find where the overlap is between the hypothetical and the actual spectra…

Page 35: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Model Spectrum

And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.

Page 36: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

• 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced.

• 1989 Yeast Genome project announced• 1990 Human Genome project announced• 1992 First chromosome (Yeast) sequenced• 1995 H. influenza sequenced • 1996 Yeast Genome sequenced• 2000 Human Genome draft

Sequencing Explosion

Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

In 1994 Jimmy Eng and John Yates published a technique to

exploit genome sequencing

And the idea was …

for use in tandem mass

spectrometry.

Page 37: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST

.…instead of searching all possible peptide sequences,

search only those in genome databases.

Now, in the post- genomic world this seems like a pretty

trivial idea,

but back then there was a lot of assumption placed on

the idea

that we’d actually have a complete Human genome in

a reasonable amount of time.

Page 38: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST Model Spectrum

For a scoring function they decided to use Cross-Correlation,

Like so. which basically sums the peaks that

overlap between hypothetical and the actual spectra

Page 39: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST Model Spectrum

And then they shifted the spectra back and ….

Page 40: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST Model Spectrum

They used this number, also called the Auto-Correlation, as their background.

… Forth so that the peaks shouldn’t align.

Page 41: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST XCorr

Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

Cross Correlation(direct comparison)

Auto Correlation(background)

This is another representation of the Cross Correlation and the Auto Correlation.

Page 42: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST XCorrCross Correlation

(direct comparison)

Auto Correlation(background)

CrossCorr

avg AutoCorr offset=-75 to 75 XCorr =Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

The XCorr score is the Cross Correlation divided

by the average of the auto correlation over a

150 AMU range.

The XCorr is high if the direct comparison is significantly

greater than the background,

which is obviously good for peptide identification.

Page 43: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

SEQUEST DeltaCn

XCorr1 XCorr 2

XCorr1and so far, there really

haven’t been any significant

improvements on it.The DeltaCn is another

score that scientists often use.

It measures how good the XCorr is relative to the

next best match.

And this XCorr is actually a pretty robust method for estimating how accurate

the match is,

As you can see, this is actually a pretty crude calculation.

Page 44: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

44

Raw Xtractor / Pause for search

* Show an MS2 file

Page 45: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

45

ProLuCID

ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.

Page 46: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Show ProLuCID RunnerCarvalho PC et al; unpublished

46

ProLuCID runner

Page 47: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Search Engine (e.g. ProLuCID, SEQUEST, etc)

Workflow

MS PSM

Protein Identification

Database

Page 48: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

48

The Challenge: How to pinpoint trustworthy identifications

1 spectrum = 1 identification!

Page 49: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

49

Filtering data

Page 50: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

50

In the beginning…

spectrum scores protein peptide

sort

by

mat

ch s

core SEQUEST

XCorr > 2.5dCn > 0.1

MascotScore > 45

X!TandemScore < 0.01

Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.

Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.

Page 51: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

51

There has to be a better way

The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:

The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:

• Poor sensitivity/specificity trade-off, unless you consider multiple scores simultaneously.

• No way to choose an error rate (p=0.05).

• Need to have different thresholds for:– different instruments (QTOF, TOF-TOF, IonTrap)– ionization sources (electrospray vs MALDI)– sample complexities (2D gel spot vs MudPIT)– different databases (SwissProt vs NR)

• Impossible to compare results from different search algorithms, multiple instruments, and so on.

Page 52: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

52

Creating a discriminant score

spectrum scores protein peptide

sort

by

mat

ch s

core

PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.

PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.

Page 53: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

pcarvalho.com 53

Scaffold:: Proteome Software

Page 54: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

54

correctly identifieseverything, with

no error

Keller et al, Anal Chem 2002

This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).

The ideal is zero error and everything identified (sensitivity = 100%).

PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.

This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).

The ideal is zero error and everything identified (sensitivity = 100%).

PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.

Page 55: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

55

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

“correct”

“incorrect”

Discriminant score (D)

Num

ber

of s

pect

ra in

eac

h bi

nThis histogram shows the distributions of correct and incorrect matches.

PeptideProphet assumes that these distributions are standard statistical distributions.

Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.

This histogram shows the distributions of correct and incorrect matches.

PeptideProphet assumes that these distributions are standard statistical distributions.

Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.

Mixture of distributions

Page 56: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

56

Sequências alvo--------------------------

Decoys rotulados}{ Estratégia

decoy para FDR

Resultado

busca

Labeled decoy – False Discovery Rate

Elias and Gygi, Nature Methods, 2007

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

Page 57: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

pcarvalho.com 57

Search Engine Processor

SVM - example

Page 58: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

58

Summary: “The use of iProphet in the TPP increases thenumber of correctly identified peptides at a constant falsediscovery rate (FDR) as compared to both PeptideProphetand another state-of-the art tool Percolator.”

Page 59: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

59

Maximizing proteins under a given FDR

Page 60: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

60

Page 61: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

61

Target Sequences--------------------------

Labeled Decoys }{ New FDR strategy

Resultado

search

Unlabeled Decoys – False Discovery Rate

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

-------------------------Unlabeled Decoyd

U-Decoy

Page 62: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Total Identified Spectra

LD (spectra) UD (spectra)

WNN 115248 1152 4656Bayes 108376 1083 1064

Unlabeled Decoys – False Discovery Rate

Page 63: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Spectra Peptides Proteins (FDR) UL FDRSEPro 104,654 17,840 1283 (0.9%) 1%Scaffold 88,970 15,406 1,160 (2.3%) 2%

Table I. Scaffold A refers to a 99% confidence level for proteins, 95% for peptides. Scaffold B refers to 95 and 80%, respectively for proteins and peptides.

Page 64: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

64

Generating the SEPro Report

Page 65: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

65

Generating the SEPro Report

Page 66: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

66

Generating the SEPro Report

Page 67: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

67

Generating the SEPro Report

Page 68: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

68

Generating the SEPro Report

Page 69: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

69

Generating the SEPro Report

Page 70: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

70

Generating the SEPro Report

Page 71: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

71

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 72: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Relative quantitation

Thermo

Page 73: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Picture from Strassberger et al, JOP, 2010

Label free quantitation

* Search for examples in xcalibur

Scan 12048How to deal with different charge states????

Subject to random sampling; what are its immplications?

Page 74: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

74

Differential Analysis is performed in two steps

Differential Analysis

Marginal Cases (found in only 1 condition)

Differential (found in both)

Page 75: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

75

Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate inbiological states B1 and B2. Panels A, B, and C consider only proteins that appearedin one or more, two or more, or in all three replicates, respectively.

Page 76: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

76

Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 (A) and B2 (B). R1, R2, and R3 refer to the replicates from 59each state.

Page 77: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

77

What proteins can be considered as statistically different for marginal cases?

Page 78: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Low ()Num. Rep. (t) Num. Proteins Fraction () p-value

1 613 0.637 0.1802 283 0.294 0.0563 66 0.069 0.019

Medium ()1 297 0.310 0.1412 417 0.435 0.0423 245 0.255 0.015

High ()1 168 0.176 0.1122 185 0.193 0.0333 604 0.631 0.011

Very High ()1 59 0.070 0.0832 62 0.073 0.0243 725 0.857 0.008

Venn Diagram of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 and B2. Proteins that could not be statistically claimed to be differentially expressed in one of the two states according tothe proposed Bayesian approach (those forwhich p-value 0.05) were automatically filtered out during the generation of the Venn Diagram.

Carvalho PC et al; Bioinformatics 2011

Page 79: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

79

Differential Analysis is performed in two steps

Differential Analysis

Marginal Cases (found in only 1 condition)

Differential (found in both)

Page 80: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

80

}}

Estrategia Tradicional - Data Dependent Analysis (DDA)

Nova estrategia – Extended Data Independent Analysis (XDIA)

Page 81: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Results

• Number of identified spectra increased by 250%.(improves label-free quantitation)

• Number of unique peptide increased by 35%.

81

Page 82: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

82

Page 83: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g
Page 84: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

84

Multiplexed spectrum identification

Page 85: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Confidence when integrating extracted ion chromatograms

DDA XDIA

Page 86: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Co-eluting peptide ions of similar m/z

A AA, B B B B

Data Dependent AnalysisExtended Data Independent Analysis

Time

Peptide Mass:

Page 87: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Spectral deconvolution and monotopic peaks reasignment to aid in identification and XIC quantitation

Page 88: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

88

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 89: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

89

Show SEProQ here

Page 90: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Pinpoint differentially expressed proteins Venn Diagrams

Gene Ontology Analysis Find trends in time-course experiments

PatternLab for proteomics: a one stop shop for data analysisCarvalho PC et al., Current Protocols in Bioinformatics, 2010

Page 91: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

91

Page 92: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

Computational workflow

Finding Statistically Differentially Expressed Proteins / Data AnalysisPatternLab for proteomics (Trends, Venn Diagrams, Differential Statistics, Gene Ontology Analysis, etc..)

Protein Quantitation

Search Engine Processor / SEProQ

Protein Identification / Quality control ProLuCID => Search Engine Processor

Search Engine Preprocessing

YADA XDIA Processor CPM

Experimental: Data acquisition using the mass spectrometer

DDA XDIA