Introduction to Proteomics
Room 4508, Level 5, Microbiol-Genetics Building
Department of Genetics, Faculty of Science, KU
Dr. Teerasak E-kobon
Outline
1. Proteomic basics
2. Mass spec-based proteomic workflow
3. Protein identification by MS, MS/MS
4. Protein quantification by MS
5. Summary
What is Proteomics ?
“ Analysis of the complete complements of proteins
Proteomics includes not only the identification and
quantification of proteins, but also the determination
of their localization, modifications, interactions,
Activities, and ultimately, their functions.”
Stan Fields (Science, 2001)
Or, the proteins present in one sample at a certain point in time.
Amino
Acid
Properties
Direct protein sequencing
Pevsner, Jonathan. (2015). Bioinformatics and Functional Genomics. 3rd Eds. Wiley, UK.
- Edman degradation process is
illustrated for a protein
fragment of six amino acids.
The first amino acid reacts
through its amino terminus
with phenylisothiocyanate
(PITC).
- Under acidic conditions this
amino acid residue, derivitized
with phenylthiohydantoin
(PTH), is cleaved and can be
identified in an amino acid
analyzer.
- The peptide now has five
amino acid residues, and the
cycle is repeated with
successive amino-terminal
amino acids.
What kind of answers can we get?
Pevsner, 2015
Conceptual proteomics experiments
Pevsner, 2015
Proteomics workflow
-1D/2D gel
-Column chromato-
graphy
-Immunoprecipitation
-Pulldowns with
tagged proteins
-Affinity depletion
Cell culture
Model organism
Body fluids
Tissues
- RP-HPLC
- Strong cation exchange (SCX)
- Weak anion exchange (WAX)
- Hydrophilic interaction (HILIC)
- Metal affinity (IMAC)- Peptide ionization (ESI, MALDI)
- Peptide fragmentation (CID, HCD, ETD, ECD, IRMPD, ISF)
- Mass analyzer (Ion trap, time-of-flight, quadrupole, orbitrap/ICR)
- Database search
- Acurate mass
time tag
- Peptide mass
fingerprinting
- Metabolic labeling
(SILAC)
- Chemical protein
/peptide (iTRAQ,
TMT)
- Label free
(spectral count,
extr. Ion chrom)
- Multiple reaction
monitoring (MRM)
Bottom up VS Top down Proteomics
“Bottom up” proteomics
Sample
- cells
- Body fluids
- Tissues
- etc
Protein
extraction
- Whole cell
lysate
- Protein
purification
Peptides
- Separation
- Clean up
Mass spectrometry
- Untargeted
analysis
- Targeted analysis
Data analysis
- Database
search
- TPP
- Quantification
“Top down” proteomics Pevsner, 2015
Sample Preparation
1. Extract and separate proteins
2. Digest proteins
3. Separate and clean up peptides
-no (low) detergent & low/no salt
Protein Digestion
hh
https://www.thermofisher.com/in/en/home/life-science/protein-biology/protein-biology-learning-center/protein-biology-resource-library/pierce-
protein-methods/sample-preparation-mass-spectrometry/_jcr_content/MainParsys/image_b5be.img.jpg/1437659928778.jpg, 9 Jan 2016
Protein separation: 1-DE
https://upload.wikimedia.org/wikipedia/commons/4/46/SDS-PAGE_Electrophoresis.png; http://static-
content.springer.com/esm/art%3A10.1186%2F1477-5956-7-32/MediaObjects/12953_2009_136_MOESM5_ESM.jpeg, 9 Jan 2016
Protein separation: 2-DE
1st dimension: Isoelectric focusing (charge)
2nd dimension: SDS-PAGE (mass)
http://biochimej.univ-angers.fr/Page2/COURS/9ModulGenFoncVeg/6Proteomique/3Figures/1FigGenerales/2PrincipeGel2D.gif, 9 Jan 2016
Different gel staining methods
Sensitivity Dynamic range MS-compatible
Coomassie 8 ng 10-30 x yes
Silver 1 ng <10x Not without
special
precautions
Fluorescent 2 ng 3 orders of yes
magnitude(ProQ Emerald
Sypro Ruby)
Comparative 2-DE gels image analysis
Zucchi et al., 2001
ExPASy 2-DE gel database
Pevsner, 2015
Differential expression by DIGE
Internal
standard
Protein
extract 1
Label with Cy3
Protein
extract 1
Label with Cy5
Mixed labelled extracts2-DE
separation
Variable mode
Imager
Differential
Analysis
software
- 2D Differential Gel Electrophoresis
- Labelled N-terminus lysines with fluorescent dyes
Pevsner, 2015
How much proteins do I need?
ng of total
proteins
ug of total
proteins
mg of total
proteins
10s-100s of protein ID’s
100s-1000s of protein ID’s
1000s-10000s of protein ID’s
Proteome wide PTM
From Proteins to Peptides
NH2+ RCOOH+
NH2+ KCOOH+
NH2+
RCOOH+
Enzymatic
digest
Detectable size range
~ 8-25 amino acids or m/z <2000
… R.AR.ESAMPLER.SPEPTIDE
Trypsin cleaves C-terminal of
Lys and Arg.
Less common enzymes:
ArgC, LysC, Pepsin, etc
Trypsin generates + charged residue on C-terminal of
peptide which facilitates y-ion production in MS/MS.
Westermeier R., Naven, T., Hopker, H.R. 2008. Proteomics in Practice. Wiley-Blackwell, US. pp. 502.
Other Proteases
Westermeier et al., 2008
Mass Spectrometry (MS)
Main components:
1. Ionization
2. Mass analyzer
3. Detector
= a scale measuring the mass of a charged
molecule, mass to charge ratio (m/z)
MS or MS/MS
http://www.sec.psu.ac.th/web-board/content/view_img.php?id=5282, 9 Jan 2016
Ionization: Electrospray Ionization
Solvent evaporation
Coulomb explosionCharged peptides
In gas phase
(M + nH)n+
Multiple charged ions
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ2LynTDoVWWHb2W6eg7mBcouozeVlMUuCvwl0n4D7aXoIgHW1i9g, 9 Jan 2016
Matrix-assisted
Laser Desorption/
ionization
Ionization: MALDI
https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Maldi.svg/1280px-Maldi.svg.png, 9 Jan 2016
MALDI Mass Spectrum
Westermeier et al., 2008
Mass Analyzer: TOF, TOF/TOF
Linear time-of-flight MS
Reflector time-of-flight MS
- Mass range up to 350 kDa
- High sensitivity
- Low resolution
- Mass range up to 5000 kDa
- Low sensitivity
- High resolution
Westermeier et al., 2008
Mass Analyzer: Quadrupole
- Mass range: < 3000 m/z
- Resolution: up to ~2000
- Accuracy: 0.1%http://www.chemicool.com/img1/graphics/quad-sch.gif, 9 Jan 2016
Mass Analyzer: Ion trap
- Mass range: typical < 2000 m/z, extended <4000m/z
- Resolution: up to ~10000
- Accuracy: 0.1-0.01 %https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQaExNWDiKMtyTzjhW2bl5yCtIDjnijXXnks1w83tXpdLA9tBb4, 9 Jan 2016
Mass Analyzer: FT-ICR
- Mass range: typical < 2000 m/z, extended <4000m/z
- Resolution: up to ~1,000,000
- Accuracy: < 5 ppm
FT-ICR (Fourier Transform Ion Cyclotron Resonance)
http://people.whitman.edu/~dunnivfm/C_MS_Ebook/CH5/Figures/Fig_5_21_e_Overview.jpg, 9 Jan 2016
Mass Analyzer: Orbitrap
- Mass range: typical < 2000 m/z, extended <6000m/z
- Resolution: up to ~240,000
- Accuracy: < 2 ppmhttp://media.americanlaboratory.com/m/20/article/18815-fig2.jpg, 9 Jan 2016
Different Mass Spectrometers
- MALDI-TOF MS
- MALDI-TOF/TOF MS/MS
- nanoLC-ESI Q/TOF MS
- Orbitrap
- MudPIT
(Multidimensional Protein
Identification Technology)
http://www.sec.psu.ac.th/web-board/content/view_img.php?id=5282, 9 Jan 2016
MudPIT
https://tanlab.files.wordpress.com/2008/10/mudpit-workflow.jpg, 9 Jan 2016
Proteomic data formats1. Mass spec data files
- Xcalibur (.raw)/Thermofinnigan
- Analyst (.wif;.t2d)/ Life Technologies
- Masslymx (.raw)/ Waters
- .baf/ Bruker
2. Analyzed data file formats
- .dta .out / Sequest
- .t2d .group / ProteinPilot
- .xml / X! tandem
- .xml .omx / OMSSA
- .mgf .dat / Mascot
- .mzxml, .pepxml, .mzData, .protxml, .mzml
*** The Human Proteome Organization (HUPO) supports a Proteomics
Standards Initiative (PSI) with the goals of defining standards for
proteomic data representation to facilitate the comparison, exchange,
and verification of data.
Protein identification by MS
(1) Peptide mass fingerprint
or peptide mass map
(2) MS/MS based peptide ID
or tandem MS
Westermeier et al., 2008
Calculating mass of a peptide or protein from m/z
m/z = (M + 1H)/z = 1233.56 [z=1]
(M + 1H) = (1233.56)*(z)
= (1233.56)*(1)
= 1233.56
M = 1233.56 – H
= 1233.56 – 1.0073
= ~ 1232.55 Da
* For MALDI-MS *Westermeier et al., 2008
Calculating mass of a peptide or protein from m/z
m/z = (M + 2H)/z = 617.28 [z=2]
(M + 2H) = (617.28)*(z)
= (617.28)*(2)
= 1234.56
M = 1234.56 – 2H
= 1234.56 – 2*(1.0073)
= ~ 1232.55 Da
* For ESI-MS *
Westermeier et al., 2008
Determination of charge state and MW
Charge state peaks
Isotope peaks
n = charge number
n = 1/Δ(m/z)
MW = (m/z)*z - nH
1
2
Solve sets of two equations
m/z = (MW + n)/n 1
m/z = (MW + n + 1)/(n + 1) 2 Westermeier et al., 2008
Peptide Mass Fingerprint (PMF)
Each peak = m/z of the peptide ionWestermeier et al., 2008
Mass Error
Mass error (ppm) = MS mass measurement error
= error/mass
Ex. A peak of mass 1529.7348, with accuracy of 10 ppm
10*10-6 = (mass error)/(mass of peak)
1*10-6 = (mass error)/(1529.7348)
mass error = (1529.7348) * (1*10-5) = 0.0153
Therefore, peak mass = 1529.7348 + 0.0153
Mass Accuracy
True mass = 1529.7348
Measured mass = 1529.7501
Δmass = 0.0153
Mass accuracy = (0.0153/1529.7348)*106
= 10 ppm
Mass Tolerance= the error window on experimental peptide
mass values
Mass tolerance = Mass Error(accuracy) * Peak Mass
= 65 ppm * 1529
= 0.1
MS/MS or Tandem MS
Select a peptide ion
Fragment a peptide ion
Measure fragment ion
mass spectra (MS/MS spectra)
Known as product ions
- To sequence individual peptides
(de novo protein sequencing)
Westermeier et al., 2008
Peptide Fragmentation in MS/MS
Collision induced
dissociation
CID
Peptide Identification
b1
b2
b3
b4
b5
b6
b7
b8
b9
y9
y8
y7
y6
y5
y4
y3
y2
y1b-ion contains N-terminus.
y-ion contains C-terminus.
Superscripted number = numbers of amino acids
Westermeier et al., 2008
Peptide Fragmentation in MS/MS
CID Spectra (Tandem MS)
y3
m/z
Rela
tive inte
nsi
ty
y4
y5
y6
y7
y8
y9
b3
b5
b6
b7
b8
Westermeier et al., 2008
Protein Coverage
Beck et al., 2001
They identified > 10,000 proteins, 174,066 peptides.
In complex mixtures many low abundance proteins
will be identified by only a single unique peptide.
Protein coverage depends on:
- Complexity of mixture
- Protein amount
- Instrument settings
- Peptide length/enzyme used
De novo sequencing of unknown peptides
Millares et al., 2012
They identified > 10,000 proteins, 174,066 peptides.
Mass changes associated with PTM(Post-translational modification)
Westermeier et al., 2008
IMAC (Immobilized-metal affinity chromatography)
For phosphorylation
Westermeier et al., 2008
Protein-Protein Interaction
Co-Immunoprecipitation (Co-IP)
Westermeier et al., 2008
Quantitative Proteomics
To quantify we need an internal or external standard with
identical physiochemical properties.
Internal standards = stable isotopic labelling
External standards = measuring the same peptide in
two consecutive runs
SILAC iTRAC/TMT AQUA
Label free quantification Targeted quant MRM AQUA
Westermeier et al., 2008
SILACStable isotope labelling by amino acids in cell culture
Metabolic labelling
Westermeier et al., 2008
SILAC
Westermeier et al., 2008
Isotopic LabellingiTRAQ (isobaric tags for relative and absolute quantification)
http://image.slidesharecdn.com/quantitativeproteomics-130430024445-
phpapp01/95/quantitative-proteomics-9-638.jpg?cb=1367289927, 9 Jan 2016
iTRAQ workflow
Fragmentation produces reporter ions
from m/z 114, 115, 116 and 117.
Westermeier et al., 2008
iTRAQ workflow
Westermeier et al., 2008
Tandem Mass Tag (TMT)
Isotopic labelling
Pros: Applicable to all samples
Relatively easy
Multiplexing (up to 8)
Cons: Cost
Separate sample processing
Westermeier et al., 2008
Other Isotopic Labellings- ICAT (do/d8) and ICAT 13C0/
13C8
(Isotope-coded affinity tags, Cys-containing proteins)
- do/d10 propionic
anhydride
(N-terminal labelling)
- 15N/14N
(whole cell labelling)
- 18O/16O (by trypsin)
Westermeier et al., 2008
Label free quantificationRelative quantification
- Extract ion
chromatogram
- Spectral counting
Pros: Cons:
- Applicable to all sample types - Susceptible to
- Cheap technical variation
- Multiplexing (infinite)Westermeier et al., 2008
Targeted QuantificationDiscovery MS/MS mode
Targeted MRM (multiple reaction monitoring) or
SRM (selected reaction monitoring) mode
Random precursor
m/z selection
Precursor m/z
fragmentation
All fragment
m/z analysis
Database search
Peptide IDs
Targeted precursor
m/z isolation
Precursor m/z
fragmentation
Selected fragment
m/z analysis
Quantification
Westermeier et al., 2008
Targeted Quantification/Identification
Known peptide
precursor m/z Known fragment m/zWestermeier et al., 2008
Targeted Quantification
Westermeier et al., 2008
PRIDE
PRoteomics IDEntifications
(PRIDE) database at the
European Bioinformatics
Institute website.
PRIDE is a central public
repository for mass
spectrometry-based
proteomics data.
Summary- Proteomics is the large-scale study of proteins, particularly
their structures and functions.
- Protein separation (electrophoresis and liquid
chromatography) and mass spectrometry (MS and MS/MS)
are important keys to protein identification and
quantification.
- Proteomics approaches can be discovery- and targeted-
modes.
- Bioinformatics facilitate high throughput proteomics data
analyses and management.
References
• Pevsner, Jonathan. (2015). Bioinformatics and Functional Genomics. 3rd Eds.
Wiley, UK.
• Westermeier R., Naven, T., Hopker, H.R. 2008. Proteomics in Practice. Wiley-
Blackwell, US. pp. 502.