6
Global profiling of protease cleavage sites by chemoselective labeling of protein N-termini Guoqiang Xu, Sung Bin Y. Shin, and Samie R. Jaffrey 1 Department of Pharmacology, Weill Medical College, Cornell University, New York, NY 10065 Edited by Solomon H. Snyder, Johns Hopkins University School of Medicine, Baltimore, MD, and approved September 25, 2009 (received for review August 6, 2009) Proteolysis has major roles in diverse biologic processes and regulates the activity, localization, and intracellular levels of proteins. Linking signaling pathways and physiologic processes to specific proteolytic processing events is a major challenge in signal transduction research. Here, we describe N-CLAP (N-terminalomics by chemical labeling of the -amine of proteins), a general approach for profiling protein N-termini and identifying protein cleavage sites during cellular sig- naling. In N-CLAP, simple and readily available reagents are used to selectively affinity label the -amine that characterizes the protein N terminus over the more highly abundant -amine on lysine residues. Protein cleavage sites are deduced by identifying the corresponding N-CLAP peptides, which are derived from the N-termini of proteins, including the N-termini of the newly formed polypeptide products of proteolytic cleavage. Through selective affinity purification and tan- dem mass spectrometry analysis of 278 N-CLAP peptides, we charac- terized proteolytic cleavage events associated with methionine amin- opeptidases and signal peptide peptidases, as well as proteins that are proteolytically cleaved after cisplatin-induced apoptosis. Many of the protein cleavage sites that are elicited during apoptotic signaling are consistent with caspase-dependent cleavage. These data demon- strate the utility of N-CLAP for proteomic profiling of protein cleavage sites that are generated during cellular signaling. caspase N-CLAP N-terminalomics tandem mass spectrometry P roteolytic processing of proteins is a major regulatory mech- anism in signal transduction (1). Proteases regulate signaling pathways by altering protein function. For example, proteases can either convert an inactive proenzyme to an active enzyme or inactivate specific proteins (1). The human genome is predicted to contain 561 proteases or protease-like enzymes, suggesting a potentially broad role for proteases (2). However, in the vast majority of cases, the specific proteins that are targeted by a protease are unknown. Similarly, the proteins and the sites that are cleaved during a signaling event are difficult to identify. To characterize protein processing on a proteomic scale, one strategy is to systematically profile protein N-termini in tissue or cellular samples. ‘‘N-terminalomics’’ is motivated by the idea that the population of N-termini changes after protease activation. This is because any single protein cleavage event results in the formation of an ‘‘internal’’ N terminus. Proteomic approaches typically use protein samples that have been digested with proteases, such as trypsin, to generate peptides suitable for tandem mass spectrometry (MS/MS) analysis. Selective enrichment of N-terminal peptides (i.e., the peptides derived from the N-termini of proteins) would be required to identify protein N-termini and proteolytic cleavage sites. Thus, there is considerable interest in the development of strategies that selectively enrich for N-terminal peptides before MS analysis. Several techniques to enrich for N-terminal peptides rely on ‘‘negative selection,’’ in which non–N-terminal peptides are re- moved from a sample, leaving N-terminal peptides (3– 6). However, because most peptides are not N-terminal peptides, the efficiency needs to be very high to obtain a preparation enriched in N- terminal peptides. In contrast, ‘‘positive selection’’ involves the purification of N-terminal peptides from a mixture of peptides. Selective labeling of the N-termini of proteins would allow N- terminal peptides to be readily purified after protease digestion (7, 8). The challenge of positive selection is to selectively recognize the -amino group, which is present exclusively on the N-termini of proteins, but not the -amino group, which is present at high levels in proteins on lysine side chains. Here we describe a proteomic strategy, N-terminalomics by chemical labeling of the -amine of proteins (N-CLAP). N-CLAP utilizes features of Edman chemistry to enable the selective labeling and capture of N-terminal peptides from proteins in complex biological samples. We demonstrate that N-CLAP can be used to characterize the diversity of N-terminal processing as well as signal peptidase specificity. Using this method, we also identify protein processing that occurs during apoptosis and identify several unique potential targets of caspase processing. Together, these data dem- onstrate that N-CLAP can be used to identify cleavage sites in proteins and identify proteolytic cleavage events that occur during cell signaling. Results A Chemoselective Strategy for Labeling and Enriching N-Terminal Pep- tides. The N terminus of proteins is characterized by an -amine, as opposed to the -amines that are on lysine side chains. The -amine (the pK a of its ammonium ion 8.95) is notable for its slightly reduced pK a relative to that of -amines (the pK a of its ammonium ion 10.53) (9). However, given this narrow pK a window, there is no pH at which the -amine can be selectively labeled with an amine-labeling reagent, because -amines, albeit less reactive, are typically much more abundant (10). The circumstances are further complicated by the fact that the pK a of the ammonium ion for the -amine and the -amine can be affected by the side chain of the adjacent amino acids, hydrogen bonding, and other intramolecular interactions (11, 12). To address these issues, we have developed a chemical labeling strategy, N-CLAP, that chemoselectively labels the N terminus. This labeling strategy is based on the chemistry of Edman degra- dation, which is used for N-terminal amino acid sequencing (13) [SI Appendix, Fig. S1 A]. In N-CLAP, the amine-reactive Edman re- agent, phenyl isothiocyanate (PITC), is used to block all of the amines in proteins (Fig. 1). TFA is then used to trigger an intramolecular cyclization of the PITC-modified -amine, which results in cleavage of the peptide bond between the first and second amino acid. This intramolecular cyclization does not occur for the PITC adduct on the -amine because an energetically favorable five-membered ring cannot form at this position (14) and leaves the PITC-modified -amine on the lysines intact (15). Therefore, the Author contributions: G.X. and S.R.J. designed research; G.X. and S.Y.S. performed re- search; G.X. analyzed data; and G.X. and S.R.J. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence should be addressed at: Department of Pharmacology, Weill Medical College, Cornell University, 1300 York Avenue, Box 70, New York, NY 10065. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0908958106/DCSupplemental. 19310 –19315 PNAS November 17, 2009 vol. 106 no. 46 www.pnas.orgcgidoi10.1073pnas.0908958106

Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

Global profiling of protease cleavage sitesby chemoselective labeling of protein N-terminiGuoqiang Xu, Sung Bin Y. Shin, and Samie R. Jaffrey1

Department of Pharmacology, Weill Medical College, Cornell University, New York, NY 10065

Edited by Solomon H. Snyder, Johns Hopkins University School of Medicine, Baltimore, MD, and approved September 25, 2009 (received for reviewAugust 6, 2009)

Proteolysis has major roles in diverse biologic processes and regulatesthe activity, localization, and intracellular levels of proteins. Linkingsignaling pathways and physiologic processes to specific proteolyticprocessing events is a major challenge in signal transduction research.Here, we describe N-CLAP (N-terminalomics by chemical labeling ofthe �-amine of proteins), a general approach for profiling proteinN-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are used toselectively affinity label the �-amine that characterizes the protein Nterminus over the more highly abundant �-amine on lysine residues.Protein cleavage sites are deduced by identifying the correspondingN-CLAP peptides, which are derived from the N-termini of proteins,including the N-termini of the newly formed polypeptide products ofproteolytic cleavage. Through selective affinity purification and tan-dem mass spectrometry analysis of 278 N-CLAP peptides, we charac-terized proteolytic cleavage events associated with methionine amin-opeptidases and signal peptide peptidases, as well as proteins thatare proteolytically cleaved after cisplatin-induced apoptosis. Many ofthe protein cleavage sites that are elicited during apoptotic signalingare consistent with caspase-dependent cleavage. These data demon-strate the utility of N-CLAP for proteomic profiling of protein cleavagesites that are generated during cellular signaling.

caspase � N-CLAP � N-terminalomics � tandem mass spectrometry

Proteolytic processing of proteins is a major regulatory mech-anism in signal transduction (1). Proteases regulate signaling

pathways by altering protein function. For example, proteasescan either convert an inactive proenzyme to an active enzyme orinactivate specific proteins (1). The human genome is predictedto contain 561 proteases or protease-like enzymes, suggesting apotentially broad role for proteases (2). However, in the vastmajority of cases, the specific proteins that are targeted by aprotease are unknown. Similarly, the proteins and the sites thatare cleaved during a signaling event are difficult to identify.

To characterize protein processing on a proteomic scale, onestrategy is to systematically profile protein N-termini in tissue orcellular samples. ‘‘N-terminalomics’’ is motivated by the idea thatthe population of N-termini changes after protease activation. Thisis because any single protein cleavage event results in the formationof an ‘‘internal’’ N terminus. Proteomic approaches typically useprotein samples that have been digested with proteases, such astrypsin, to generate peptides suitable for tandem mass spectrometry(MS/MS) analysis. Selective enrichment of N-terminal peptides(i.e., the peptides derived from the N-termini of proteins) would berequired to identify protein N-termini and proteolytic cleavagesites. Thus, there is considerable interest in the development ofstrategies that selectively enrich for N-terminal peptides before MSanalysis.

Several techniques to enrich for N-terminal peptides rely on‘‘negative selection,’’ in which non–N-terminal peptides are re-moved from a sample, leaving N-terminal peptides (3–6). However,because most peptides are not N-terminal peptides, the efficiencyneeds to be very high to obtain a preparation enriched in N-terminal peptides. In contrast, ‘‘positive selection’’ involves thepurification of N-terminal peptides from a mixture of peptides.

Selective labeling of the N-termini of proteins would allow N-terminal peptides to be readily purified after protease digestion (7,8). The challenge of positive selection is to selectively recognize the�-amino group, which is present exclusively on the N-termini ofproteins, but not the �-amino group, which is present at high levelsin proteins on lysine side chains.

Here we describe a proteomic strategy, N-terminalomics bychemical labeling of the �-amine of proteins (N-CLAP). N-CLAPutilizes features of Edman chemistry to enable the selective labelingand capture of N-terminal peptides from proteins in complexbiological samples. We demonstrate that N-CLAP can be used tocharacterize the diversity of N-terminal processing as well as signalpeptidase specificity. Using this method, we also identify proteinprocessing that occurs during apoptosis and identify several uniquepotential targets of caspase processing. Together, these data dem-onstrate that N-CLAP can be used to identify cleavage sites inproteins and identify proteolytic cleavage events that occur duringcell signaling.

ResultsA Chemoselective Strategy for Labeling and Enriching N-Terminal Pep-tides. The N terminus of proteins is characterized by an �-amine,as opposed to the �-amines that are on lysine side chains. The�-amine (the pKa of its ammonium ion �8.95) is notable for itsslightly reduced pKa relative to that of �-amines (the pKa of itsammonium ion �10.53) (9). However, given this narrow pKawindow, there is no pH at which the �-amine can be selectivelylabeled with an amine-labeling reagent, because �-amines, albeitless reactive, are typically much more abundant (10). Thecircumstances are further complicated by the fact that the pKaof the ammonium ion for the �-amine and the �-amine can beaffected by the side chain of the adjacent amino acids, hydrogenbonding, and other intramolecular interactions (11, 12).

To address these issues, we have developed a chemical labelingstrategy, N-CLAP, that chemoselectively labels the N terminus.This labeling strategy is based on the chemistry of Edman degra-dation, which is used for N-terminal amino acid sequencing (13) [SIAppendix, Fig. S1A]. In N-CLAP, the amine-reactive Edman re-agent, phenyl isothiocyanate (PITC), is used to block all of theamines in proteins (Fig. 1). TFA is then used to trigger anintramolecular cyclization of the PITC-modified �-amine, whichresults in cleavage of the peptide bond between the first and secondamino acid. This intramolecular cyclization does not occur for thePITC adduct on the �-amine because an energetically favorablefive-membered ring cannot form at this position (14) and leaves thePITC-modified �-amine on the lysines intact (15). Therefore, the

Author contributions: G.X. and S.R.J. designed research; G.X. and S.Y.S. performed re-search; G.X. analyzed data; and G.X. and S.R.J. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

1To whom correspondence should be addressed at: Department of Pharmacology, WeillMedical College, Cornell University, 1300 York Avenue, Box 70, New York, NY 10065.E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0908958106/DCSupplemental.

19310–19315 � PNAS � November 17, 2009 � vol. 106 � no. 46 www.pnas.org�cgi�doi�10.1073�pnas.0908958106

Page 2: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

PITC adducts on the N-termini are selectively ‘‘deblocked.’’ AfterTFA treatment, the protein is shortened by one amino acid, andimportantly, contains only a single �-amine, whereas all lysinesremain blocked. The �-amine is then labeled using an amine-reactive tagging reagent, such as a biotinylating agent, resulting inselective tagging of the N terminus of each protein. Proteins aredigested, and tagged peptides are purified, for example using avidinagarose. N-terminal peptides are eluted and identified by MS(Fig. 1).

We first addressed whether N-terminal peptides generated byN-CLAP would be sufficiently unique to identify parent proteins.The majority (74%) of N-CLAP peptides prepared by virtualtrypsin digestion of proteins listed in the Swiss-Prot human proteindatabase (v55.6, July 1, 2008) (16) are at least 5 aa long, a length thatmay provide enough sequence information for protein identifica-tion. Of these peptides, 68% identify a single protein in the database(SI Appendix, Fig. S1 B–D). Most of the peptides that match morethan one protein match a family of highly similar isoforms. Thus, forN-CLAP peptides prepared with trypsin, 54% of the humanproteome could be unambiguously detected. To increase the cov-erage, the results from parallel experiments, each using a separateprotease, could be merged. For example, using trypsin and Glu-Cwould increase coverage of the proteome to 96.6% (SI Appendix,Fig. S1E). Thus, a significant fraction of potential N-CLAP pep-tides would provide unambiguous identification of parent proteins.In some cases, the masses of N-CLAP peptides alone could be usedto identify many proteins. This would be most useful when analyzing

small proteomes with instrumentation capable of providing highlyaccurate masses (SI Appendix, Fig. S1F).

Selective Labeling of Peptide N-Termini. To evaluate N-CLAP, wefirst characterized the efficiency of each chemical reaction using amodel peptide, angiotensin I, which contains a single amine at itsN terminus. After reaction of angiotensin I with PITC, the apparentmass increased by the expected 135 Da (Fig. 2 A and B). After TFAtreatment, the peptide mass is reduced by 250 Da, which corre-sponds to the mass of the N-terminal residue, aspartic acid,modified with PITC (Fig. 2C). A mass increase of 388 Da wasdetected after labeling the peptide with the amine-reactive cleav-able labeling reagent EZ-Link Sulfo-NHS-SS-biotin (Fig. 2D). Thefinal step, removal of the biotin by Tris(2-carboxyethyl)phosphine(TCEP)-mediated reduction of the disulfide linker, resulted in theexpected loss of 300 Da, corresponding to the removal of thethiol-containing biotin moiety (Fig. 2E). Similar results were ob-tained with another peptide, derived from �-catenin (SI Appendix,Fig. S2 A–F).

Fig. 1. Strategy for N-CLAP. After PITC treatment, all amines in a protein areblocked (filled circle). Reaction with TFA selectively deblocks the N terminusbut not PITC-modified lysines. The newly generated amino groups at theN-termini are then reacted with an amine-specific labeling reagent, such asEZ-Link Sulfo-NHS-SS-biotin (oval). After protein digestion, the N-terminalpeptides are recovered using avidin-based resins, eluted with a reducingagent, and then identified by MS.

Fig. 2. Selective and efficient labeling of angiotensin I using N-CLAP. (A)MALDI-TOF-MS of angiotensin I results in a major peptide ion at 1297.0. (B)PITC-treated angiotensin I results in a major peptide ion reflecting the ex-pected thiocarbamoyl PITC modification. (C) TFA treatment of PITC-modifiedangiotensin I results in a single prominent peptide ion with a mass corre-sponding to angiotensin I missing the first N-terminal residue. (D) The de-blocked angiotensin I was reacted with EZ-Link Sulfo-NHS-SS-biotin. MALDI-TOF-MS revealed a single major peak corresponding to the modified peptide.(E) Avidin-purified and TCEP-eluted sample results in a 1,270.2-Da peptide ion,corresponding to angiotensin I peptide without N-terminal aspartic acid andwith a remnant from the cleavable biotin tag. The treatments, peptide se-quence, and the chemical modification for each sample are indicated.

Xu et al. PNAS � November 17, 2009 � vol. 106 � no. 46 � 19311

BIO

CHEM

ISTR

Y

Page 3: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

We next performed a similar analysis using the c-Myc peptide,which contains both an �-amine and an �-amine from an internallysine (SI Appendix, Fig. S2 G–K). As with angiotensin I, the c-Mycpeptide was readily modified by PITC, and the first amino acid wasreadily removed with TFA. Biotinylation resulted in selectivelabeling of the �-amine, whereas the PITC-blocked �-amine was notlabeled (SI Appendix, Fig. S2 J). Notably, the PITC modification didnot impair peptide fragmentation, indicating that MS/MS spectraobtained from PITC-modified peptides could be used for peptideidentification. Together, these data suggested that each step ofN-CLAP can occur with high efficiency and with negligible sidereactions.

Selective Enrichment of N-Terminal Peptides from Proteins. We nextapplied N-CLAP to individual proteins. To monitor the complete-ness of each step, an assay for amines in proteins was developed.Amines were labeled with biotin N-hydroxysuccinimide ester (bi-otin-NHS). Signals obtained after Western blotting with a biotinantibody indicate the presence of unmodified amines. Incubation ofinsulin, casein, or RNase A with PITC resulted in complete loss ofamine reactivity (Fig. 3A). After TFA treatment, amine reactivitywas recovered, consistent with selective deblocking of the Nterminus. In the case of insulin, the recovered signal is approxi-mately half as intense as the signal for the unmodified protein (Fig.3A, Top), consistent with the presence of two amines in the heavychain of insulin, one �-amine and one �-amine at its sole lysineresidue. However, the recovered signal is much weaker for caseinand RNase A (Fig. 3A, Middle and Bottom) than that of the originalsamples, which is expected because these two proteins have manylysines (12 and 7 for casein and RNase A, respectively).

We next addressed whether the N-terminal tryptic peptide couldbe enriched using N-CLAP. After PITC and TFA treatment,RNase A was labeled with EZ-Link Sulfo-NHS-SS-biotin.MALDI-TOF-MS of the tryptic digest reduced by TCEP revealednumerous peptides, including a relatively low abundance peak witha mass corresponding to the expected mass of the N-terminalpeptide containing a PITC-modified lysine and a mercaptoethano-ate (HSCH2CH2CO–) amine adduct that remains after the cleavageof the biotin tag (Fig. 3B). However, after avidin affinity purifica-tion, this peak was the predominant peak (Fig. 3C and SI Appendix,Fig. S3). Additionally, a peak representing a peptide in which thePITC modification on lysine is hydrolyzed, which would appear asa 1,196-Da peak, was not detected, confirming the stability of thePITC adduct. Together, these data demonstrate the usefulness ofN-CLAP for positive purification of N-terminal peptides.

Characterization of the Diversity of N-Terminal Methionine Process-ing. We next examined the ability of N-CLAP to generate N-terminal peptides from cellular lysates. Cell lysate was preparedfrom human Jurkat T cells and subjected to N-CLAP. As with theindividual proteins, PITC effectively blocked the amines in thelysate (Fig. 4A), and TFA resulted in the reappearance of a fractionof the amine reactivity. Amines were biotinylated, proteins weredigested, and N-terminal peptides were affinity isolated. A total of80 N-terminal peptides were identified using liquid chromatogra-phy (LC) quadrupole (Q)-TOF MS/MS (SI Appendix, Fig. S4 andTable S1). As an example, the MS/MS spectrum of the N-terminalpeptide from 26S proteasome non-ATPase regulatory subunit 6 isshown (Fig. 4B). As with many N-CLAP peptides, it starts at thethird amino acid predicted from the coding sequence. This reflectsthe removal of the original N-terminal methionine by endogenousmethionine aminopeptidases (MetAPs) and the loss of the secondamino acid during the TFA deblocking step in the N-CLAPprocedure. The identity of the residue lost during N-CLAP can beextrapolated from protein sequence databases.

Many of the N-CLAP peptides detected in Jurkat T cell lysatewere derived from the N-termini of proteins lacking the N-terminalmethionine. However, 27.2% of peptides were derived from pro-

teins that retained the N-terminal methionine. This likely reflectsinefficient removal of methionine by MetAPs, as has been de-scribed for methionine residues followed by charged or bulky aminoacids (17). MS/MS analysis of N-CLAP peptides also indicated thatnumerous proteins exist in the cytosol as a heterogeneous popu-lation, with a fraction containing the N-terminal methionine andanother fraction missing the N-terminal methionine. Some proteins(16.2%) were the source of three N-CLAP peptides, reflectingvariable trimming of N-terminal residues. These data indicate thatthe identity of the N-terminal residue can be considerably differentfrom what is predicted by sequence databases.

Determination of Organelle-Targeting Peptide Cleavage Sites UsingN-CLAP. Many N-CLAP peptides were derived from proteins thatwere processed to remove the first 15 or more N-terminal aminoacids (Table S2). Some N-CLAP peptides were derived fromtransmembrane or secreted proteins, in which signal peptides areremoved by signal peptide peptidases (SPPs) (18). Other N-CLAPpeptides were derived from mitochondrial proteins, which areprocessed by mitochondrial processing peptidases (MPPs) to re-

Fig. 3. Assessment of the N-CLAP for enrichment of the N-terminal peptidefrom individual proteins. (A) PITC completely modifies amines in proteins, andTFA generates new N-terminal amines. The presence of amines in the unreacted,PITC-treated, and both PITC- and TFA-treated samples (insulin, �-casein, andRNase A) was measured by reaction with biotin-NHS and visualized by Westernblotting with a biotin antibody. The band intensity reflects the relative amountof free amines. Coomassie staining confirms comparable loading in each lane. Aslight increase in the mobility of PITC-modified proteins is typically detected andlikely reflects the reduced positive charge on the protein due to lysine modifica-tion. (B) MALDI-TOF-MS spectrum of the tryptic peptides generated after RNaseA was subjected to N-CLAP, digested, and reduced (without avidin purification).MALDI-TOF reveals multiple peptides, including a peptide consistent with theN-terminal RNase A N-CLAP peptide (filled circle). (C) N-CLAP enriches for N-terminal peptides. After RNase A is subjected to N-CLAP, the tryptic peptides arepurified on avidin agarose and eluted with TCEP. MALDI-TOF-MS spectrum of theTCEP eluate reveals a predominant peak. This peptide corresponds to the N-CLAPpeptide from the RNase A N terminus, containing a PITC adduct on an internallysine (asterisk) and an N-terminal mercaptoethanoate amine adduct (shown insequence) derived from the cleaved biotinylation reagent.

19312 � www.pnas.org�cgi�doi�10.1073�pnas.0908958106 Xu et al.

Page 4: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

move the transit peptides (19). As an example, the N-CLAP peptidederived from protein disulfide-isomerase A3 indicates that the first24 aa of the nascent protein is removed (Fig. 4C), consistent withprevious MS results (3).

Signal peptides lack simple consensus sequences can possibly bepredicted using computational approaches (20). Because differentalgorithms predict different cleavage sites, the precise cleavage siteshave to be experimentally verified. Signal peptides typically show 3distinct regions: a positively charged N-terminal region, a hydro-phobic region, and a C-terminal uncharged polar region. Exami-nation of N-CLAP peptides (SI Appendix, Fig. S4 and Table S2)indicates that only 57%, 93%, and 28% of signal peptides containa positively charged N terminus, a hydrophobic region, and aC-terminal uncharged polar region, respectively, and only 14% of

signal peptides have all three regions. These results indicate con-siderable diversity in signal peptides and confirm previous studiesreporting that signal peptides do not conform to a simple consensussequence (20, 21).

All of the N-CLAP–identified putative mitochondrial transitpeptides, except one, were enriched in Arg, Ser, and Ala anddeficient in negatively charged residues, consistent with proposedcharacteristics for these sequences (22, 23). Most (68%) have aconserved Arg close to the cleavage site. However, there was alsoheterogeneity in cleavage sites, even in the same protein. Forexample, several distinct N-CLAP peptides were identified fromATPase inhibitor, reflecting distinct pools of protein cleaved afterthe 24th, 25th, and 26th amino acid (Table S2). These noncanonicalcleavage events may reflect broad cleavage specificity of MPPs ormay reflect additional N-terminal processing by distinct peptidases.More than half (52.6%) of the signal or transit sequences and thecleavage sites identified in our analysis had not been previouslyexperimentally validated, according to the annotation in the Swiss-Prot database (16).

Identification of Caspase Cleavage Sites Using N-CLAP. We nextexamined proteolytic processing events elicited by cisplatin, achemotherapeutic agent. On the basis of a time course analysis ofcisplatin-induced apoptosis in Jurkat T cells (SI Appendix, Fig. S5),cells were harvested after treatment with vehicle or cisplatin for 8 hand then processed using the N-CLAP procedure. The majority ofN-CLAP peptides from both vehicle and cisplatin-treated Jurkat Tcells were derived from the N terminus of proteins, from internalcleavages, or from the N terminus of proteins that were generatedafter the removal of signal sequences (Fig. 5A). However, aftercisplatin treatment, many additional internal N-terminal peptideswere also identified (Fig. 5 B and C), some of which were derivedfrom proteins cleaved after well-established consensus caspasecleavage sites, such as DEXD and (I/L/V)EXD (24). Other proteinswere cleaved at sites that did not resemble canonical caspasecleavage sites but occurred after aspartate, the minimal feature forcaspase cleavage (SI Appendix, Fig. S4 and Table S3) (7, 24). Amongthe putative caspase substrates, only 17% had canonical caspasecleavage sites, whereas nearly half (48%) had the consensus se-quence of DXXD. These data suggest physiologic utilization ofheterogeneous cleavage sites by caspases.

Most of the proteins cleaved after cisplatin treatment werepreviously identified as caspase targets using various approaches (7,25, 26). However, some of them, such as protein phosphatase 1G,myosin-10, HERV-H�3q26, and pyridoxal-dependent decarboxyl-ase domain-containing protein 1, have not been previously reportedas being cleaved during apoptosis. Overall, 56% of the cleavage siteshave not been previously reported. To confirm that these cleavageevents were caspase dependent, Jurkat T cells were treated eitherwith vehicle, cisplatin, or cisplatin and z-VAD-fmk, a pan-caspaseinhibitor. We selected proteins that included previously reportedand newly discovered caspase substrates for validation, in part onthe basis of commercial availability of antibodies. In each case, theselected proteins were either completely or partially degraded afterthe addition of cisplatin, and the cleavage was blocked by thecaspase inhibitor, confirming that the cleavage events were indeedcaspase dependent (Fig. 5D).

In addition to caspase cleavage sites, additional classes of cleav-age sites were identified (SI Appendix, Fig. S4 and Table S4),consistent with findings that cisplatin induces the activation ofproteases other than caspases (27). In some cases, N-CLAP iden-tified proteins that had previously been shown to be cleaved duringapoptosis, but the specific cleavage sites was not identified (28, 29).Together, these results demonstrate the ability of N-CLAP toidentify numerous caspase-dependent and caspase-independentcleavage events after cisplatin treatment.

Fig. 4. Identification of protein N-termini from a complex mixture usingN-CLAP. (A) Evaluation of the completeness of PITC and TFA treatment onJurkat T cell lysate. The amine content of a Jurkat T cell lysate is monitored byWestern blotting with a biotin antibody at different stages in the N-CLAPprotocol. Note that the exposure time for the right panel is much longer thanfor the left panel to detect the recovered signal after TFA treatment. Silverstaining showed similar loading. (B and C) Representative MS/MS spectra of 2N-CLAP peptides from Jurkat T cell lysate. The sequence map of the peptides,the N-terminal modification, the position of the first amino acid in the codingsequences, and the protein names are indicated in the spectra. PITC modifi-cation on lysine is indicated by an asterisk. In B, the N-CLAP peptide starts fromthe third amino acid from the coding sequence, which indicates that the initialmethionine is removed from the nascent protein. However, in C the N-CLAPpeptide starts from the 26th amino acid from the coding sequence, whichindicates that the first 24 aa are removed in the nascent protein because theN-CLAP removes one additional N-terminal amino acid. The symbols \, /, and �represent b-ions, y-ions, and both b-ions and y-ions, respectively.

Xu et al. PNAS � November 17, 2009 � vol. 106 � no. 46 � 19313

BIO

CHEM

ISTR

Y

Page 5: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

DiscussionProfiling the N-termini of proteins in complex mixtures is apowerful approach for elucidating proteolysis signaling pathways.Obtaining N-terminal peptides for N-terminalomics is challengingbecause the �-amine at the protein N terminus is nearly chemicallyindistinguishable from the �-amines on lysines. We have developeda strategy for chemoselective labeling of the N terminus, N-CLAP,which has resulted in the identification of 278 N-CLAP peptides innormal and apoptotic Jurkat T cells. N-CLAP provides a straight-forward chemical strategy for selective N-terminal labeling ofproteins and for enrichment of N-terminal peptides.

Using N-CLAP to characterize protein N-termini in cellularlysates reveals a surprising degree of trimming of the protein Nterminus, considering that MetAPs are the major enzymes thathave been linked to N-terminal processing. It is not clear whetheramino acid cleavage after the removal of the initial methionine isalso mediated by MetAPs or whether other aminopeptidases areinvolved. Conceivably, the differentially processed N-terminalforms of these proteins have different stabilities, due to effects ofN-terminal residues on protein stability (30), or may endow theprotein with different functions. Similarly, the N-termini of mem-brane proteins revealed considerable variability, suggesting heter-ogeneity in the specificity of SPPs and MPPs or secondary proteinprocessing events after initial cleavage.

Characterization of protease processing induced by cisplatinreveals cleavage at numerous different types of sites, includingconsensus caspase cleavage sites. Many of these sites have beenpreviously described, validating the accuracy of N-CLAP. Addi-tionally, we biochemically validated the caspase dependence of thecleavage of several proteins. In numerous cases, N-CLAP identifiedproteins that were already known to be cleaved during apoptosis,but the specific sites were not known. Thus, N-CLAP provides astrategy to readily identify cleavage sites in proteins, thereby

permitting mutagenesis and the generation of cleavage-resistantproteins for mechanistic studies. Of note, we identified fewercleavage sites than other studies using N-terminal labeling ap-proaches (7). This likely reflects the limited number of N-CLAPsamples subjected to MS and differences in MS instrumentation,peptide identification algorithms, and settings and parameters usedfor positive identification, as well as different apoptotic stimuli.These differences are unlikely to reflect an intrinsic inability todetect certain peptides that have been processed in the N-CLAPprocedure by MS/MS. Indeed, a �-catenin peptide that was notdetected in Jurkat T cell lysates in our initial N-CLAP experiments,but was detected in earlier studies (7), is detectable when a syntheticpeptides is used (SI Appendix, Fig. S2 A–F).

Another potential application of N-CLAP is to identify thesubstrates of a specific protease. Comparison of N-CLAP pep-tides derived from cells expressing different levels of specificproteases or treated with an inhibitor of a protease of interestwould reveal protease substrates. These analyses would befacilitated by tagging N-CLAP peptides using standard quanti-tative proteomic approaches (31).

In addition to monitoring protein processing, N-terminalomicscan be used to enhance the sensitivity of quantitative proteomics.The large number of peptides obtained after tryptic digestion canprevent thorough identification of a significant fraction of theproteins in the sample owing to instrument duty cycle, ion sup-pression, and dynamic range. Although this is partially alleviatedusing techniques such as isotope-coded affinity tags, in which onlycysteine-containing peptides are analyzed (32), N-CLAP can pro-vide further simplification of a peptide mixture by providing a singlepeptide per protein. Additionally, using this approach, peptideidentification can be enhanced by searching databases solely com-prising N-terminal peptides (6) against MS/MS spectra, or highlyaccurate MS spectra when examining small proteomes (Fig. S1).

Fig. 5. Identification of caspase cleavage sitesinduced by apoptosis by N-CLAP. (A) Distribu-tion of N-CLAP peptides prepared from vehicle-and cisplatin-treated Jurkat T cells. The N-CLAPpeptides are classified into 4 categories: (i) N-terminal peptides (N-term), which comprise N-CLAP peptides starting from the 2nd, 3rd, and4th amino acid from the coding sequence; (ii)N-terminal peptides from proteins with signalor transit peptides removed (signal); (iii) N-terminal peptides derived from internal N-termini of proteins cleaved in a manner consis-tent with caspase cleavage (caspase); and (iv)N-terminal peptides derived from internal N-termini of proteins cleaved at sites that do notresemble caspase cleavage sites (internal). (Band C) Representative MS/MS spectra of theN-CLAP peptides derived from the internal N-terminiofvimentinandsyntaxin-7. JurkatTcellswere treated with cisplatin (200 �M for 8 h). (D)Biochemical validation of caspase substrates.Western blot analysis of vimentin, myosin-10,NAP1L1 (nucleosome assembly protein 1-like 1),PPM1G(proteinphosphatase1G),and�-actin,anegative control, was used to assess the caspasedependence of their cleavage after cisplatintreatment. Jurkat T cells were treated with ei-ther DMSO, 200 �M cisplatin, or 200 �M cispla-tin after 2 h pretreatment with z-VAD-fmk (20�M). The levels of each of these proteins werereduced after cisplatin treatment, and the ef-fect was blocked by inclusion of the caspaseinhibitor. The size of the vimentin cleavageproduct (43 kDa, asterisk) is consistent withcleavage after Asp-85, which is indicated by thevimentin N-CLAP peptide.

19314 � www.pnas.org�cgi�doi�10.1073�pnas.0908958106 Xu et al.

Page 6: Global profiling of protease cleavage sites by ...N-termini and identifying protein cleavage sites during cellular sig-naling. In N-CLAP, simple and readily available reagents are

A major limitation of protocols that add tags to N-terminalamines is that, in some cases, the N terminus of proteins are‘‘blocked’’ (e.g., acetylated). Although estimates of the frequency ofblocked N-termini vary (33), blocked proteins would not be de-tectable in these approaches because they cannot be labeled on theN terminus. This may be addressed by using enzymes or chemicalmethods to remove N-terminal blocking groups (34). On the otherhand, the pool of proteins that have not yet been blocked may bedetectable using N-CLAP, depending on the sensitivity of the MSinstrumentation. Indeed, we detected N-terminal peptides fromseveral proteins, such as CDKN2AIP N-terminal-like protein andpeptidyl-prolyl isomerase A, which have been reported to beacetylated on the N terminus (16). Similarly, in experiments meantto identify processing events during cellular signaling, the nascentinternal N-termini may not be blocked during the time course of anexperiment. N-terminal blocking may also not be a significantproblem when examining secreted or extracellular proteins, be-cause this process is generally thought to use acetylating agents,such as acetyl-CoA, which are typically cytosolic (35). However,certain N-terminal modifications, such as methylation, are expectedto be detectable, on the basis of their known compatibility withEdman degradation (36).

In the N-CLAP procedure, all peptides contain a C-terminalarginine because trypsin does not cleave after the PITC-modifiedlysines that are generated during the N-CLAP procedure. Thus they1-ion is always 175 m/z, which can be used to confirm the specificityof peptides identified using the N-CLAP approach. The onlyexceptions are peptides that derive from the C terminus of proteins,which end in the native C-terminal residue. Interestingly, we alsofind that many peptides obtained using the N-CLAP methodologyhave intense b-ions (see Figs. 4 and 5). This feature could be usedin database searching to improve the identification of the N-terminal peptides.

Recently, an approach combining gel fractionation, MS, and aninnovative bioinformatic analysis has been developed to identify

protease substrates during apoptosis (37). This method determinesthe identity of proteins that are cleaved and is not designed toidentify the specific cleavage sites. Strategies for the positiveselection of N-terminal peptides and subsequent identification ofcleavage sites have previously been described. In one approach,O-methylisourea is used to block �-amines before �-amine labelingwith conventional amine-reactive labeling reagents (8). AlthoughO-methylisourea preferentially modifies the �-amine of lysines overthe �-amine, the reaction may be problematic owing to the absenceof absolute specificity for the �-amine of lysines and the incomple-tion of the reaction for proteins (8, 38). In a second approach, anengineered enzyme, subtiligase, is used to covalently link a cleav-able glycolate ester-derivatized peptide tag to the N-termini ofproteins, thereby facilitating the purification and subsequentprotease-dependent release of N-terminal peptides (7). However,this approach requires specialized reagents that are not routinelyavailable. An attractive feature of N-CLAP is that it can beperformed using widely available and inexpensive laboratoryreagents.

Materials and MethodsSample Preparation and MS Analysis. Samples were prepared by in-gel trypsindigestion,andN-CLAPpeptideswerepurifiedusingneutravidinagarose.Peptidesamples were analyzed either by MALDI-TOF-MS (Applied Biosystems) or byLC/electrospray ionization (ESI) MS/MS on an Agilent 6250 Q TOF MS. SpectrumMill was used for N-CLAP peptide identification by searching against a Swiss-Prothuman database.

The materials used and detailed methods are described in SI Text.

ACKNOWLEDGMENTS. We thank Dr. Yuliang Ma and Mr. Albert Morrishow atWeill Cornell Medical College (WCMC) Mass Spectrometry Core Facility for theirassistance in LC/ESI-MS/MS and MALDI-TOF-MS analysis. Mass spectrometry wasperformed at the WCMC MS Core Facility using instrumentation supported byNational Institutes of Health Grants RR19355 and RR22615. This work was sup-portedbyNational InstituteofAllergyand InfectiousDiseasesGrantAI068639 (toS.R.J.), the Dorothy Rodbell Sarcoma Foundation (S.R.J.), and Training GrantT32CA062948 from the National Cancer Institute (to G.X. and S.Y.S.).

1. Ehrmann M, Clausen T (2004) Proteolysis as a regulatory mechanism. Annu Rev Genet38:709–724.

2. Puente XS, Lopez-Otin C (2004) A genomic analysis of rat proteases and proteaseinhibitors. Genome Res 14:609–622.

3. Gevaert K, et al. (2003) Exploring proteomes and analyzing protein processing by massspectrometric identification of sorted N-terminal peptides. Nat Biotechnol 21:566–569.

4. StaesA,etal. (2008)Improvedrecoveryofproteome-informative,proteinN-terminalpeptidesby combined fractional diagonal chromatography (COFRADIC). Proteomics 8:1362–1370.

5. McDonald L, Robertson DH, Hurst JL, Beynon RJ (2005) Positional proteomics: Selectiverecovery and analysis of N-terminal proteolytic peptides. Nat Methods 2:955–957.

6. McDonaldL,BeynonRJ (2006)Positionalproteomics:Preparationofamino-terminalpeptidesas a strategy for proteome simplification and characterization. Nat Protoc 1:1790–1798.

7. Mahrus S, et al. (2008) Global sequencing of proteolytic cleavage sites in apoptosis byspecific labeling of protein N termini. Cell 134:866–876.

8. Timmer JC, et al. (2007) Profiling constitutive proteolytic events in vivo. Biochem J407:41–48.

9. Cohen SA (2005) Quantitation of amino acids as 6-aminoquinolyl-N-hydroxysuccinimi-dyl carbamate derivatives. Quantitation of Amino Acids and Amines by Chromatog-raphy: Methods and Protocols, ed Molnar-Perl I (Elsevier, Amsterdam), Vol 70, p 246.

10. Strickberger MW (2000) Evolution (Jones and Bartlett, Sudbury, MA), 3rd Ed.11. Stites WE, Gittis AG, Lattman EE, Shortle D (1991) In a staphylococcal nuclease mutant

the side-chain of a lysine replacing valine 66 is fully buried in the hydrophobic core. JMol Biol 221:7–14.

12. Hoofnagle AN, Resing KA, Ahn NG (2003) Protein analysis by hydrogen exchange massspectrometry. Annu Rev Biophys Biomol Struct 32:1–25.

13. Edman P (1956) Mechanism of the phenyl isothiocyanate degradation of peptides.Nature 177:667–668.

14. Bailey JM (1995) Chemical methods of protein sequence analysis. J Chromatogr A705:47–65.

15. Jay DG (1984) A general procedure for the end labeling of proteins and positioning ofamino acids in the sequence. J Biol Chem 259:15572–15578.

16. Boeckmann B, et al. (2003) The SWISS-PROT protein knowledgebase and its supple-ment TrEMBL in 2003. Nucleic Acids Res 31:365–370.

17. Li JY, et al. (2004) Mutations at the S1 sites of methionine aminopeptidases fromEscherichia coli and Homo sapiens reveal the residues critical for substrate specificity.J Biol Chem 279:21128–21134.

18. Weihofen A, Binns K, Lemberg MK, Ashman K, Martoglio B (2002) Identification ofsignal peptide peptidase, a presenilin-type aspartic protease. Science 296:2215–2218.

19. Schneider H, Arretz M, Wachter E, Neupert W (1990) Matrix processing peptidase ofmitochondria. Structure-function relationships. J Biol Chem 265:9881–9887.

20. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cellusing TargetP, SignalP and related tools. Nat Protoc 2:953–971.

21. von Heijne G (1990) The signal peptide. J Membr Biol 115:195–201.22. Schneider G, et al. (1998) Feature-extraction from endopeptidase cleavage sites in

mitochondrial targeting peptides. Proteins 30:49–60.23. Emanuelsson O, von Heijne G, Schneider G (2001) Analysis and prediction of mito-

chondrial targeting peptides. Methods Cell Biol 65:175–187.24. Thornberry NA, et al. (1997) A combinatorial approach defines specificities of members

of the caspase family and granzyme B. Functional relationships established for keymediators of apoptosis. J Biol Chem 272:17907–17911.

25. Van Damme P, et al. (2005) Caspase-specific and nonspecific in vivo protein processingduring Fas-induced apoptosis. Nat Methods 2:771–777.

26. Wu YH, Shih SF, Lin JY (2004) Ricin triggers apoptotic morphological changes throughcaspase-3 cleavage of BAT3. J Biol Chem 279:19264–19275.

27. Pabla N, Dong Z (2008) Cisplatin nephrotoxicity: Mechanisms and renoprotectivestrategies. Kidney Int 73:994–1007.

28. Yuan X, et al. (2007) Nuclear protein profiling of Jurkat cells during heat stress-inducedapoptosis by 2-DE and MS/MS. Electrophoresis 28:2018–2026.

29. Schmidt F, et al. (2007) Quantitative proteome analysis of cisplatin-induced apoptoticJurkat T cells by stable isotope labeling with amino acids in cell culture, SDS-PAGE, andLC-MALDI-TOF/TOF MS. Electrophoresis 28:4359–4368.

30. Varshavsky A (1992) The N-end rule. Cell 69:725–735.31. Ong SE, Foster LJ, Mann M (2003) Mass spectrometric-based approaches in quantitative

proteomics. Methods 29:124–130.32. Gygi SP, et al. (1999) Quantitative analysis of complex protein mixtures using isotope-

coded affinity tags. Nat Biotechnol 17:994–999.33. Meinnel T, Peynot P, Giglione C (2005) Processed N-termini of mature proteins in higher

eukaryotes and their major contribution to dynamic proteomics. Biochimie 87:701–712.34. Hirano H, Kamp RM (2003) Deblocking of N-terminally modified proteins. Methods

Mol Biol 211:355–363.35. Polevoda B, Sherman F (2000) N�-terminal acetylation of eukaryotic proteins. J Biol

Chem 275:36479–36482.36. Chen R, Brosius J, Wittmann-Liebold B (1977) Occurrence of methylated amino acids as

N-termini of proteins from Escherichia coli ribosomes. J Mol Biol 111:173–181.37. Dix MM, Simon GM, Cravatt BF (2008) Global mapping of the topography and mag-

nitude of proteolytic events in apoptosis. Cell 134:679–691.38. Cohen LA (1968) Group-specific reagents in protein chemistry. Annu Rev Biochem

37:695–726.

Xu et al. PNAS � November 17, 2009 � vol. 106 � no. 46 � 19315

BIO

CHEM

ISTR

Y