http://www.bits.vib.be/training
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Lennart [email protected]
Proteomics Services GroupEuropean Bioinformatics Institute
Hinxton, CambridgeUnited Kingdomwww.ebi.ac.uk
kenny helsens
Computational Omics and Systems Biology Group
Department of Medical Protein Research, VIBDepartment of Biochemistry, Ghent University
Ghent, Belgium
introduction to proteomics
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Adapted from the NCBI Science Primerhttp://www.ncbi.nih.gov/About/primer/genetics_cell.html
- Primary structure (sequence)
- Secondary structure (structural elements)
- Tertiairy structure (3D shape)
- Modifications (dynamic, function)
- Processing (targetting, activation)
…YSFVATAER…
phosphorylation
trypsinplatelet activity
The central paradigm
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Principle
Protein A Protein B
Protein C Protein Dcells protein mixture
cell lysisprotein extraction
2D-PAGE
pI
MrChemistrytoolbox
2D-PAGE separation of proteins (Est. 1975)
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
300 400 500 600 700 800 900 1000 1100m/z0
100
%
300 400 500 600 700 800 900 1000 1100m/z0
100
%
protein extraction complex protein mixture
2D-PAGE separation
MS analysis
MS/MS analysis
pI
MW
http://www.akh-wien.ac.at/biomed-research/htx/platweb1.htm
fragmentation
tryptic
digest
2D-PAGE separation of proteins (Est. 1975)
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
enzymaticdigest
extremely complexpeptide mixture
Data-dependent MS/MS analyses
separationselection
MS analysis
protein extraction complex protein mixture
http://www.akh-wien.ac.at/biomed-research/htx/platweb1.htm
less complexpeptide fractions
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100m/z0
100
%
Overall gel-free proteomics workflow
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
• ICAT (Gygi et al., 1999)
• MudPIT (Washburn et al., 2001)
• Accurate Mass Tags for proteome analysis (Conrads et al., 2000)
• Signature Peptides approach for proteomics (Ji et al., 2000)
• AA-based covalent chromatography peptide selection (Wang & Regnier, 2001)
• Affinity-based enrichment of phosphopeptides (Oda et al., 2001)
• ICAT for phosphopeptides (Zhou et al., 2001)
• Reversible biotinylation of Cys-peptides (Spahr et al., 2000)
• COFRADIC (Gevaert et al., 2002)
Going gel-free in the new millennium
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
• Massive increase in mixture redundancy (eg. membrane proteins) Corresponding increase in mixture complexity (from a few
thousand proteins to hundreds of thousands of peptides!)
• Easier seperation of peptides instead of proteins Loss of protein-level information (pI, MW, isoforms)
• Mixture complexity can be reduced by peptide selection (Cys-peptides, Met-peptides, N-terminal peptides, phospho-peptides, …) Again leading to reduced redundancy of the mixture
• Choice of selection technique, depending on circumstances/analyte Massive amounts of data generated (10.000 spectra per hour)
• Additional processing information (N-terminal peptides) Unadapted database search engines (N-terminal processing)
An overview of the pro’s and cons
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
AN INCOMPLETE OVERVIEW
OF GEL-FREE TECHNIQUES
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
RPSCX ESI-based MS
Strong cationexchanger
Reverse-phaseresin
• Orthogonal, 2D separation of peptides
• 2D analogon: pI = SCX, Mr = RP
MudPIT: that which we call a rose…
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
e.g., Escherichia coli 4,349 predicted proteins
if 100% expressed 109,934 detectable tryptic peptides
if 50% expressed 54,967 detectable tryptic peptides
Sample complexity increased one order of magnitude!
But what about the complexity?
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
What happens when there are 100.000 peptides present?
How often do we need to repeat an analysis of an identical sample in order to obtain reasonable coverage?
The explorative aspect
A thought experiment seems appropriate
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
The explorative aspect
2002
2006
2010
Complete coverage
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Tissue
one cell-type
one organel /
compartment
subset of
proteins
subset of
peptides
cells
compartments
proteins
peptides
Preselected, representative peptides
• Laser capture microdissection• Flow cytometry
• Differential Detergent fractionation
• Differential centrifugation
• Gel-filtration• 1D-gel electrophoresis• Ion-exchange
• ICAT-method• COmbined FRActional
Diagonal Chromatography
More coverage by reducing population size
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Isotope Coded Affinity Tag
1) Modify cysteine residues using a molecule consisting of 3 parts:
• a thiol reactive group
• a biotin label
• a linker that may contain light or heavy atoms
2) Digest proteins
3) Affinity isolation of labeled cysteine-peptides
4) Use cysteine-peptides for LC-MS/MS analysis
Peptide selection techniques: ICAT
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
O
NH NH
SNH
O
OOX
XO N
H
X
X
X
X
X
X
IO
biotinheavy reagent: X = deuteriumlight reagent: X = hydrogen
thiol-specificreactive group
The linker allows differential proteome analysis!
Evoked mass difference = 8 amu’s.
From: Gygi SP et al., Nature Biotechnology, 1999
The ICAT molecule
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
COmbined FRActional DIagonal Chromatography
• Selection technique based on diagonal chromatography
• Versatile – requires only a specific modification that changes chromatographic properties
• Already applied to methionine, cysteine, N-terminal, nitrosylated, glycosylated, phosphorylated and ATP-binding peptides
• N-terminal analysis is well-suited for detecting proteolytic events
From: Gevaert et al., Molecular & Cellular Proteomics, 2002Gevaert et al., Nature Biotechnology, 2003
Peptide selection techniques: COFRADIC
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
AU
time
gradient
Separate and collect in fractions
Chemical (or enzymatic) alteration of subset of peptides
in separate or combined fractions
Altered peptides display changed chromatographic properties
(-, +)Alternatively: selected peptides are not altered (=0), while non selected peptides are altered
AU
time
gradient
- +
=0
COFRADIC in principle
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Methionine COFRADIC(Gevaert et al., 2002)
N-terminal COFRADIC(Gevaert et al., 2003)
... N C C ...O
CH2
CH2
SCH3
HH... N C C ...
O
CH2
CH2
SCH3
O
HH
H2O2-oxidation
methionine methionine-sulfoxide
primary run secondary run
... N C C ...O
CH2
CH2
SCH3
HH... N C C ...
O
CH2
CH2
SCH3
O
HH
H2O2-oxidation
methionine methionine-sulfoxide
primary run secondary run
Ac AA1 AA2 AA3 AA4 ... Arg
NH2 AA1 AA2 AA3 AA4 ... Arg
NH2 AA1 AA2 Lys AA4 ... Arg
NH-Ac
Ac AA1 Lys AA3 AA4 ... Arg
NH-Ac
Ac AA1 AA2 AA3 AA4 ... Arg
Ac AA1 Lys AA3 AA4 ... Arg
NH-Ac
NH
AA1 AA2 AA3 AA4 ... Arg
NO2
NO2
NO2
NH
AA1 AA2 Lys AA4 ... Arg
NH-Ac
NO2
NO2
NO2
primary run secondary run
TNBS modification
N-terminalpeptides
internalpeptides
Ac AA1 AA2 AA3 AA4 ... Arg
NH2 AA1 AA2 AA3 AA4 ... Arg
NH2 AA1 AA2 Lys AA4 ... Arg
NH-Ac
Ac AA1 Lys AA3 AA4 ... Arg
NH-Ac
Ac AA1 AA2 AA3 AA4 ... Arg
Ac AA1 Lys AA3 AA4 ... Arg
NH-Ac
NH
AA1 AA2 AA3 AA4 ... Arg
NO2
NO2
NO2
NH
AA1 AA2 Lys AA4 ... Arg
NH-Ac
NO2
NO2
NO2
primary run secondary run
TNBS modification
N-terminalpeptides
internalpeptides
COFRADIC in practice (I)
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
... N C C ...O
CH2
SH
HH... N C C ...
O
CH2
S
HH
S
NO2
HOOC
... N C C ...O
CH2
SH
HH
primary run secondary run
cysteine cysteine
TNB-cysteine
Ellman’s reagent TCEP reduction
Cysteine COFRADIC(Gevaert et al., 2004)
COFRADIC in practice (II)
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
COFRADIC in practice (III)~60% Detectable!
log1
0(M
ass
N-te
rmin
al P
eptid
e)
log10(Mass C-terminal Peptide)
~60% Detectable!
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Thank you!
Questions?