1
The human genome encodes ~20,000 different proteins, which are mole- cules that carry out the majority of biological functions. The identity and func- tion of a protein is defined by its unique sequence of 20 naturally-occurring amino acids connected through peptide bonds. Some common proteins are keratin, collagen, and hemoglobin. Most human diseases involve changes in the composition or functions of proteins in one form or another. Proteomics is a powerful technique that allows the identification and quantification of virtual- ly all the protein species in a tissue, i.e., its proteome, on a large-scale, through which researchers can gain important insights into the molecular makeup and physiology of a biological sample. At the Center, our investiga- tors use proteomics to compare the expression, interactions, and turnover of proteins in heart disease models to understand disease mechanisms. Maggie PY Lam Department of Medicine-Cardiology | CFReT University of Colorado, Anschutz Medical Campus Proteomics at the CU Cardiology A Primer for Students and Others WHY PROTEOMICS? Although spectra could be interpreted manually to find peptide sequences, modern mass spectrometers generate hundreds of thousands of spectra per day, which is infeasible to analyze by hand. A database search method and algorithm is therefore commonly used to automatically identify peptides from spectra. A sequence database contains predicted protein sequencs from ge- nomic sequences, which can be used to generate theoretical spectra that contain predicted fragment ion masses. An algorithm is then used to evaluate the cross correlation between the theoretical and the experimental spectrum, and the best matched theoretical spectra are ranked. An alternative to sequence database search is spectral library search. Em- pirically identified spectra from prior experiments are collected and deposited into a library, which can be used to directly identify peptides in subsequent experiments by matching new experimental spectra to the library spectra. Mass spectrometers can also be used to quantify peptides and proteins. Two common methods are quantification based on spectral counts (number of spectra from which the peptide was identified) and ion intensity (the area-un- der-curve of the peptide ion signal). Additional methods including isotope la- beling and targeted MS can also be used for accurate measurements. How do you identify and quantify proteins from biological samples? The tradi- tional way was to use antibodies that recognize specific proteins, but this ap- proach suffers from limited throughput. To identify proteins on a large scale, proteomics scientists use mass spectrometry. A mass spectrometer is like a balance for very small things: it can measure the accurate molecular weight of protein molecules. Because mass spectrometers are not very good at analyzing entire proteins, protein samples are usually first digested into smaller fragments (peptides) prior to analysis. This is typically done with trypsin, a sequence-specific prote- ase that cleaves the protein sequence after every lysine (K) and arginine (R). The proteome is too complex for mass spectrometers to analyze all at the same time, so digested peptides are usually first resolved using liquid chro- matography, an analytical chemistry technique which separates the peptides into individual elution peaks over time, based on their chemical properties. Once the peptides enter the mass spectrometer, they are ionized, and their mass is recorded as the precursor mass in a mass spectrum (MS1 spec- trum). Many peptides could have the same masses, however, so sequences are only deduced when the peptides are fragmented and the individual frag- ment masses recorded again in a tandem mass spectrum (MS2 spectrum). A mass spectrometry experiment could take hours to days, during which time spectra are continuously acquired. The ion signal intensity of a particular mass range over time is summarized in a chromatogram. From the fragment masses on the MS2 spectrum, one can deduce the se- quence of the peptide that gave rise to the spectrum, and by extension the protein that gave rise to the peptide. EXPERIMENTAL WORKFLOW 1. Cox J and Mann M. Quantitative, High-Resolution Proteomics for Da- ta-Driven Systems Biology. Annual Review of Biochemistry. 2011;80:273-99. 2. Marcotte EM. How do shotgun proteomics algorithms identify proteins? Nature Biotechnology. 2007; 25(7):755-7. FURTHER READINGS DATA ANALYSIS WORKFLOW spectra time (chromatogram) m/z y 5 y 6 y 7 y 8 y 9 experimental spectra Find best match sequence-specific theoretical spectra quantification and functional analysis search engine Generate theoretical spectra SVYCTP NKPSRT SMSKMF VKGAPE GVIDRC sequence database (e.g., UniProt) m/z m/z peptide ID/ protein ID healthy disease ion intensity Number of identified spectra protein quantity protein quantity spectral counts-based quantification ion intensity-based quantification peptide ID peptide ID t t biological sample extract proteins proteolysis LC-MS/MS data acquisition peptide fragmentation in tandem mass spectrometry protein a 1 b 1 c 1 a 2 b 2 c 2 a 3 b 3 c 3 x 3 y 3 z 3 x 2 y 2 z 2 x 1 y 1 z 1 e.g., sarcomere e.g., trypsin OH O CH 3 H 2 N O O NH NH 2 HN H N H N O N H N H N N H O ............. N-terminus Alanine Ala/A 71.0371 Da Tryptophan Trp/W 186.0793 Da Serine Ser/S 87.0320 Da Glycine Gly/G 57.0214 Da Arginine Arg/R 156.1011 Da C-terminus (~20,000 to 200,000 Da) R 1 O R 2 O R 3 O R 4 | || | || | || | H 2 N — C — C — N — C — C — N — C — C — N — C — COOH H H H H H H H H + relative abundance m/z 0 0 100% 200 E b series y series A I/L E T D D I/L I/L T I/L D D TE y 7 + b 7 + b 8 + b 9 + b 10 + b 11 + b 12 + b 13 + b 14 + b 6 + b 5 + b 4 + y 3 + y 4 + y 5 + y 6 + y 9 + y 8 + y 10 + y 13 + y 12 + y 11 + I/L A E T D 400 600 800 1000 1200 1400 1600 1800

Proteomics at the CU Cardiologymaggielab.org/static/assets/img/maggie/figure/Proteomics_Primer.pdf · matography, an analytical chemistry technique which separates the peptides into

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proteomics at the CU Cardiologymaggielab.org/static/assets/img/maggie/figure/Proteomics_Primer.pdf · matography, an analytical chemistry technique which separates the peptides into

The human genome encodes ~20,000 different proteins, which are mole-cules that carry out the majority of biological functions. The identity and func-tion of a protein is defined by its unique sequence of 20 naturally-occurring amino acids connected through peptide bonds. Some common proteins are keratin, collagen, and hemoglobin. Most human diseases involve changes in the composition or functions of proteins in one form or another. Proteomics is a powerful technique that allows the identification and quantification of virtual-ly all the protein species in a tissue, i.e., its proteome, on a large-scale, through which researchers can gain important insights into the molecular makeup and physiology of a biological sample. At the Center, our investiga-tors use proteomics to compare the expression, interactions, and turnover of proteins in heart disease models to understand disease mechanisms.

Maggie PY Lam

Department of Medicine-Cardiology | CFReTUniversity of Colorado, Anschutz Medical Campus

Proteomics at the CU Cardiology A Primer for Students and Others

WHY PROTEOMICS?

Although spectra could be interpreted manually to find peptide sequences, modern mass spectrometers generate hundreds of thousands of spectra per day, which is infeasible to analyze by hand. A database search method and algorithm is therefore commonly used to automatically identify peptides from spectra. A sequence database contains predicted protein sequencs from ge-nomic sequences, which can be used to generate theoretical spectra that contain predicted fragment ion masses. An algorithm is then used to evaluate the cross correlation between the theoretical and the experimental spectrum, and the best matched theoretical spectra are ranked.

An alternative to sequence database search is spectral library search. Em-pirically identified spectra from prior experiments are collected and deposited into a library, which can be used to directly identify peptides in subsequent experiments by matching new experimental spectra to the library spectra.

Mass spectrometers can also be used to quantify peptides and proteins. Two common methods are quantification based on spectral counts (number of spectra from which the peptide was identified) and ion intensity (the area-un-der-curve of the peptide ion signal). Additional methods including isotope la-beling and targeted MS can also be used for accurate measurements.

How do you identify and quantify proteins from biological samples? The tradi-tional way was to use antibodies that recognize specific proteins, but this ap-proach suffers from limited throughput. To identify proteins on a large scale, proteomics scientists use mass spectrometry. A mass spectrometer is like a balance for very small things: it can measure the accurate molecular weight of protein molecules.

Because mass spectrometers are not very good at analyzing entire proteins, protein samples are usually first digested into smaller fragments (peptides) prior to analysis. This is typically done with trypsin, a sequence-specific prote-ase that cleaves the protein sequence after every lysine (K) and arginine (R). The proteome is too complex for mass spectrometers to analyze all at the same time, so digested peptides are usually first resolved using liquid chro-matography, an analytical chemistry technique which separates the peptides into individual elution peaks over time, based on their chemical properties.

Once the peptides enter the mass spectrometer, they are ionized, and their mass is recorded as the precursor mass in a mass spectrum (MS1 spec-trum). Many peptides could have the same masses, however, so sequences are only deduced when the peptides are fragmented and the individual frag-ment masses recorded again in a tandem mass spectrum (MS2 spectrum). A mass spectrometry experiment could take hours to days, during which time spectra are continuously acquired. The ion signal intensity of a particular mass range over time is summarized in a chromatogram.

From the fragment masses on the MS2 spectrum, one can deduce the se-quence of the peptide that gave rise to the spectrum, and by extension the protein that gave rise to the peptide.

EXPERIMENTAL WORKFLOW

1. Cox J and Mann M. Quantitative, High-Resolution Proteomics for Da-ta-Driven Systems Biology. Annual Review of Biochemistry. 2011;80:273-99.2. Marcotte EM. How do shotgun proteomics algorithms identify proteins? Nature Biotechnology. 2007; 25(7):755-7.

FURTHER READINGS

DATA ANALYSIS WORKFLOW

spectra

time(chromatogram)

m/z

y5 y6 y7 y8 y9

experimentalspectra

Find best match

sequence-specific theoretical spectra

quantificationand functional analysis

searchengine

Generatetheoretical spectra

SVYCTPNKPSRTSMSKMFVKGAPEGVIDRC

sequencedatabase

(e.g., UniProt)

m/z m/z

peptide ID/ protein ID

healthy

disease

ion

inte

nsity

Num

ber o

f ide

ntifie

d sp

ectra

prot

ein q

uant

ity

prot

ein q

uant

ity

spectral counts-based quantification ion intensity-based quantification

peptide ID

peptide ID

t

t

biologicalsample

extractproteins proteolysis LC-MS/MS data acquisition

peptide fragmentation in tandem mass spectrometry

protein

a1 b1 c1 a2 b2 c2 a3 b3 c3

x3 y3 z3 x2 y2 z2 x1 y1 z1

e.g., sarcomere e.g., trypsin

OH

O

CH3

H2N

O O

NH

NH2HN

HN

HN

O NH

NH

N

NH

O

.............N-terminus

AlanineAla/A

71.0371 Da

TryptophanTrp/W

186.0793 Da

SerineSer/S

87.0320 Da

GlycineGly/G

57.0214 Da

ArginineArg/R

156.1011 Da

C-terminus

(~20,000 to 200,000 Da)

R1 O R2 O R3 O R4 | || | || | || |H2N — C — C — N — C — C — N — C — C — N — C — COOH H H H H H H H

H+

rela

tive

abun

dan

ce

m/z0

0

100%

200

Eb series

y series

A I/L E T D D I/L I/LT

I/L D D T E

y7+

b7+

b8+

b9+b10

+ b11+b12

+b13

+

b14+

b6+

b5+

b4+y3

+ y4+

y5+

y6+

y9+

y8+

y10+

y13+

y12+

y11+

I/L A E TD

400 600 800 1000 1200 1400 1600 1800