View
217
Download
2
Category
Tags:
Preview:
Citation preview
Analysis of Complex Proteomic Datasets Using Scaffold
Free Scaffold Viewer can be downloaded at:www.proteomesoftware.com
• Beyond the realm of manual interpretationBeyond the realm of manual interpretation• How do we determine what is a valid protein How do we determine what is a valid protein
identification?identification?
Shotgun proteomics Analysis of complex mixturesShotgun proteomics Analysis of complex mixtures
1.2 Million Spectra!!!
Whole cell extract
10,000+ proteins
600,000 peptides
Scaffold: Why do we need it?
Statistical Analysis Using Scaffold
• All search engines use different scoring All search engines use different scoring algorithms algorithms Can not directly compare results Can not directly compare results
• Many search engines results are described by Many search engines results are described by more than one valuemore than one value
Examples:Examples:
Mascot Mascot Ion Score and Identity Score Ion Score and Identity Score
Sequest Sequest Xcorr and DeltaCn Xcorr and DeltaCn
Peptide Prophet*Peptide Prophet*
• Creates a universal score (discriminant score) for the search Creates a universal score (discriminant score) for the search engine result (e.g. XCorr and DeltaCn are compressed to oneengine result (e.g. XCorr and DeltaCn are compressed to one score for SEQUEST results, Ion score and Identity score forscore for SEQUEST results, Ion score and Identity score for Mascot results)Mascot results)
• Plots a histogram of the discriminant scores and Plots a histogram of the discriminant scores and calculates a bimodal distribution based on standard calculates a bimodal distribution based on standard statistics to differentiate between correct and incorrect hitsstatistics to differentiate between correct and incorrect hits
• Computes the Computes the probabilityprobability that the match is correct at a that the match is correct at a given discriminant scoregiven discriminant score
*Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, 4646-4658
Statistical Analysis Using Scaffold
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3Discriminant score (D)
Nu
mb
er o
f sp
ectr
a in
eac
h b
in
Histogram of discriminate scoresHistogram of discriminate scores
Statistical Analysis Using Scaffold
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
Nu
mb
er o
f sp
ectr
a in
eac
h b
in
Discriminant score (D)
Assumes a mixture of standard statistical
distributions ““incorrect”incorrect”
““correct”correct”
Statistical Analysis Using Scaffold
““incorrect”incorrect”
““correct”correct”
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
Nu
mb
er o
f sp
ectr
a in
eac
h b
in
Discriminant score (D)
Peptide Probability Threshold
( | ) ( )( | )
( | ) ( ) ( | ) ( )
p D pp D
p D p p D p
Statistical Analysis Using Scaffold
9%
19% 7%
34%
5%
4%22%
SEQUEST
X!Tandem
One Search One Search Engine may Engine may
not be not be enoughenough
Mascot
Statistical Analysis Using Scaffold
www.proteomesoftware.com
• Peptide Prophet statistics are applied separately for Peptide Prophet statistics are applied separately for each search engine result (i.e. Mascot, SEQUEST, each search engine result (i.e. Mascot, SEQUEST, and X!Tandem)and X!Tandem) • Scaffold MergerScaffold Merger combines the peptide probabilities combines the peptide probabilities from each search engine to generate a proteinfrom each search engine to generate a protein probability probability
The probability of identifying a spectrumThe probability of identifying a spectrum++
The probability of agreement between search engines The probability of agreement between search engines
Protein ProbabilityProtein Probability
Statistical Analysis Using Scaffold
Advantages using of ScaffoldAdvantages using of Scaffold
• Allows you to choose a statistical error rate by setting Allows you to choose a statistical error rate by setting probability thresholdsprobability thresholds
• Allows you to compare and combine results from Allows you to compare and combine results from different experiments and different search enginesdifferent experiments and different search engines
• Allows sharing of raw data and search results Allows sharing of raw data and search results
• Accepted as a suitable statistical method to validate Accepted as a suitable statistical method to validate large datasetslarge datasets
Statistical Analysis Using Scaffold
This is the Samples view This is the Samples view
List of all the proteins found in your samplesList of all the proteins found in your samples
Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries
Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries
General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins
AA
BBProtein A and Protein B Protein A and Protein B share all the same share all the same peptides so they will be peptides so they will be grouped togethergrouped together
How does Scaffold Deal with peptides that can be assigned to
more than one protein?
General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins
Protein A and protein B Protein A and protein B each have one unique each have one unique peptide peptide they will be they will be listed separately listed separately only only if if the peptide probability is the peptide probability is > 50%> 50%
How does Scaffold Deal with peptides that can be assigned to
more than one protein?
AA
BB
How does Scaffold Deal with peptides that can be assigned to
more than one protein?
General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins
Protein B has two unique Protein B has two unique peptides peptides it will be listed it will be listed separatelyseparatelyAA
BB
Scaffold will extract GO terms from NCBI annotationsScaffold will extract GO terms from NCBI annotations
Gene Ontology “GO” terms
• Controlled vocabulary containing consistent Controlled vocabulary containing consistent descriptions of gene products in different descriptions of gene products in different databasesdatabases
• Describe gene products in terms of their Describe gene products in terms of their associated biological processes, cellularassociated biological processes, cellular components and molecular functions in a speciescomponents and molecular functions in a species independent mannerindependent manner
Gene Ontology Project http://www.geneontology.org/GO.doc.shtml
List of samplesList of samples
Color coded to represent probability that protein identification is correct
Color coded to represent probability that protein identification is correct
Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined
Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined
This is the Proteins viewThis is the Proteins view
Spectrum of each peptide labeled with y and b ions which can be used for manual validation
Spectrum of each peptide labeled with y and b ions which can be used for manual validation
Manual Spectrum Evaluation• Search engine scores Search engine scores Is peptide found by more Is peptide found by more than one search engine?than one search engine?
Mascot ion score > 40Mascot ion score > 40SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion)SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion)
deltaCn > 0.2deltaCn > 0.2• Good signal-to-noiseGood signal-to-noise• Long stretches of y and/or b ionsLong stretches of y and/or b ions• All dominant peaks are assigned as y or b ionsAll dominant peaks are assigned as y or b ions• Fragmentation chemistry Fragmentation chemistry
N-terminal cleavage at P N-terminal cleavage at P dominate y-ion dominate y-ionC-terminal cleavage at D and E C-terminal cleavage at D and E dominate b-ion dominate b-ionPeptides containing W Peptides containing W abundant y-ions abundant y-ionsS and T S and T tend to lose water (-18 Da) tend to lose water (-18 Da)R, N, and Q R, N, and Q tend to lose ammonia (-17 Da) tend to lose ammonia (-17 Da)
b9-H2O
b3 y3 b4 y4b5 b6
y5
y6b7
y7
b8
b9
y8
y9
b10y10
b11y11 b12 y12b13
I A E L A G F S V P E N T KK T N E P V S F G A L E A I
m/z
Re
lati
ve
Inte
ns
ity
0%
50%
100%
0 250 500 750 1000 1250
1474.73 AMU, +2 H (Parent Error: -650 ppm)
Peptide Sequence Peptide Sequence IAELAGFSVPENTKIAELAGFSVPENTK+2 charge on parent peptide+2 charge on parent peptide
Good Spectrum
SEQUEST: Xcorr = 2.61SEQUEST: Xcorr = 2.61 deltaCn = 0.4deltaCn = 0.4
Dominant y-ion at N-terminal cleavage of PDominant y-ion at N-terminal cleavage of P
Mascot: Ion Score = 60.1Mascot: Ion Score = 60.1 Identify Score = 37.3Identify Score = 37.3
Good coverage of y and b ion seriesGood coverage of y and b ion series
Good signal-to-noise
b9-H2O-H2O+2H
x8+2H b13+2H
b13+2H+1
internal PLADYAL-NH3a7-H2O+1
y15+2Hb8-H2O
b17-H2O+2H
b17+2H
b9-H2O-H2O
b9+1
b9+2
b19+2H
internal PLADYALTPD-CO
x17
b20+2H
b20+2H+1
b21+2H+1
b22+2H+1y3 y4
b5y5 y6 y7
b8
b9
y9
y10
b11y11
y12 b14b15
Y P L A D Y A L T P D M A I V D A N L V M D M P K
K P M D M V L N A D V I A M D P T L A Y D A L P Y
m/z
Re
lati
ve
Inte
ns
ity
0%
50%
100%
0 500 1000 1500 2000 2500
2767.75 AMU, +3 H (Parent Error: -240 ppm)
Bad SpectrumPeptide Sequence YPLADYALTPDMAIVDANLVMDMPK
+3 charge on parent peptide
SEQUEST: Xcorr = 2.26SEQUEST: Xcorr = 2.26 deltaCn = 0.2deltaCn = 0.2
Mascot: Ion Score = 9.93Mascot: Ion Score = 9.93 Identity Score = 37.3Identity Score = 37.3
Poor signal-to-noisePoor signal-to-noise
Poor coverage of y and b ion seriesPoor coverage of y and b ion series
Multiple unassigned peaksMultiple unassigned peaks
This is the Statistics viewThis is the Statistics view
Score HistogramScore Histogram
Blue indicates “incorrect” proteins
Protein is “correct” if it passes the peptide and protein Protein is “correct” if it passes the peptide and protein probability and minimum # peptide filtersprobability and minimum # peptide filters.
Scaffold Statistics View
Red indicates “correct” proteins
Important! Must have enough data to fit two distributions for the
statistics to be valid.
Scaffold Statistics View
With only 1 unique peptide (95% peptide prob)
the maximum protein probability is <90%.
With at least 2 unique Peptides (95% peptide prob)
the maximum protein probability is ~100%.
SEQUEST only
Scaffold Statistics View
Missed IDs
Mascot only
Scaffold Statistics View
Missed IDs
Scaffold Statistics ViewUsing both Mascot and Sequest results in more Using both Mascot and Sequest results in more ““correct” protein identificationscorrect” protein identifications
Mascot only
Sequest only
Both
This is the Publish ViewThis is the Publish View
http://www.mcponline.org/misc/ParisReport_Final.shtml
Journal of Molecular and Cellular ProteomicsJournal of Molecular and Cellular Proteomics
Publication Guidelines for Proteomic Data
• Name and version of software used to extract peak list Name and version of software used to extract peak list
• Name and version of database searching software (Mascot, Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Sequest, Spectrum Mill, or X! Tandem)
• Values of all search parameters used (enzyme, modifications, Values of all search parameters used (enzyme, modifications, mass tolerance, etc.)mass tolerance, etc.)
• Name and size of the database searched (Swisprot or NCBI and Name and size of the database searched (Swisprot or NCBI and the number of sequence entries)the number of sequence entries)
• Name and version of any additional software used for statistical Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings)requirements, probability settings)
Data AnalysisData Analysis
Publication Guidelines for Proteomic Data
Publication Guidelines for Proteomic Data
Each Protein IdentifiedEach Protein Identified• Accession numberAccession number
• Sequence coverage and total number of unique Sequence coverage and total number of unique peptides peptides
Each Peptide IdentifiedEach Peptide Identified
• Peptide sequence noting any modifications or Peptide sequence noting any modifications or missed cleavagesmissed cleavages
• Parent peptide ion mass and charge Parent peptide ion mass and charge
• All search engine scoresAll search engine scores
Recommended