Upload
volien
View
217
Download
0
Embed Size (px)
Citation preview
Advances and Challenges in LiquidChromatography-Mass Spectrometry-basedProteomics Profiling for Clinical Applications*Wei-Jun Qian, Jon M. Jacobs, Tao Liu, David G. Camp II, and Richard D. Smith‡
Recent advances in proteomics technologies provide tre-mendous opportunities for biomarker-related clinical ap-plications; however, the distinctive characteristics of hu-man biofluids such as the high dynamic range in proteinabundances and extreme complexity of the proteomespresent tremendous challenges. In this review we sum-marize recent advances in LC-MS-based proteomics pro-filing and its applications in clinical proteomics as wellas discuss the major challenges associated with imple-menting these technologies for more effective candi-date biomarker discovery. Developments in immunoaf-finity depletion and various fractionation approaches incombination with substantial improvements in LC-MSplatforms have enabled the plasma proteome to be pro-filed with considerably greater dynamic range of cover-age, allowing many proteins at low ng/ml levels to beconfidently identified. Despite these significant advancesand efforts, major challenges associated with the dy-namic range of measurements and extent of proteomecoverage, confidence of peptide/protein identifications,quantitation accuracy, analysis throughput, and the ro-bustness of present instrumentation must be addressedbefore a proteomics profiling platform suitable for effi-cient clinical applications can be routinely implemented.Molecular & Cellular Proteomics 5:1727–1744, 2006.
Advances in MS technologies, high resolution liquid phaseseparations, and informatics/bioinformatics for large scaledata analysis have made MS-based proteomics an indispen-sable research tool with the potential to broadly impact biol-ogy and laboratory medicine (1). In particular, proteomicstechnologies have been increasingly applied to the study ofdisease-related clinical samples (e.g. human blood serum/plasma, proximal fluids, and disease tissues) for the purposesof identifying novel disease-specific protein biomarkers, gain-ing better understandings of disease processes, and discov-ering novel protein targets for therapeutic interventions anddrug developments (2).
Proteomics-based candidate biomarker discovery efforts
have recently gained significant attention due to the power ofthese technologies for analyzing complex protein mixturesand their potential for identifying novel markers indicative ofdisease. It is widely believed that many complex human dis-eases, including cancers, might be more effectively cured ifspecific disease biomarkers were available to enable detec-tion and treatment at very early stages of disease (3). Despitenoteworthy efforts, only a handful of cancer biomarkers havebeen approved by the United States Food and Drug Adminis-tration (FDA)1 for clinical use, with the majority of these beingprotein biomarkers (4). Although existing markers play a signif-icant role in screening, monitoring, and staging, effective bi-omarkers are not currently available for most cancers and aregenerally nonexistent for early detection (3). Therefore, there is aclear need for applying advanced technologies such as thesebased on proteomics in the quest for novel candidate clinicalbiomarkers.
Although widely speculated that advances in genomics andproteomics would alter the landscape of clinical biomarkerdiscovery and validation, the declining trend of new FDA-approved biomarkers reported over the last decade (5) high-lights the magnitude of the challenges associated with humanclinical samples and validation of candidate biomarkers. Con-tributing to these challenges are the substantial complexity ofthe human proteome and the heterogeneity of the humanpopulation, both of which make the search for biomarkersfrom either biofluids or disease tissues a daunting task. As aresult of the heterogeneous nature of humans and the com-plexity of diseases, e.g. cancers, a panel of biomarkers ratherthan a single marker may be required to achieve the highsensitivity and specificity required for clinical applications (3).Proteomics technologies offer significant potential for discov-ering such marker panels.
Many different technologies have been applied for biomar-ker discovery and other clinical applications, including two-dimensional (2D) gel-electrophoresis (6), LC-MS, and protein-and antibody-based microarrays (7–9). LC-MS- or tandem MS
From the Biological Sciences Division and Environmental MolecularSciences Laboratory, Pacific Northwest National Laboratory,Richland, Washington 99352
Received, May 2, 2006, and in revised form, July 25, 2006Published, MCP Papers in Press, August 3, 2006, DOI 10.1074/
mcp.M600162-MCP200
1 The abbreviations used are: FDA, Food and Drug Administration;SCX, strong cation exchange chromatography; NET, normalized elu-tion time; AMT, accurate mass and time; IMS, ion mobility spectrom-etry; 2D, two-dimensional; RPLC, reversed phase LC; MARS, multipleaffinity removal system; HUPO, Human Proteome Organization; LPS,lipopolysaccharide; MRM, multiple reaction monitoring.
Review
Molecular & Cellular Proteomics 5.10 1727This paper is available on line at http://www.mcponline.org
(MS/MS)-based proteomics technologies offer highly sensi-tive analytical capabilities and a relatively large dynamic rangeof detection and have increasingly become the method ofchoice for in depth profiling of complex protein mixtures (1). Inaddition, the relatively high throughput of LC-MS technolo-gies is amenable to clinical applications that involve humanbiofluids and disease tissues. The application of LC-MS/MSfor human biofluid protein profiling was initiated by the firstglobal shotgun proteomics study of human plasma/serumpublished in 2002 by Adkins et al. (10). An explosion of LC-MS-based applications in human plasma/serum and variousbiofluids soon followed due to the tremendous interest inidentifying disease-related proteins (11, 12). Various deple-tion/fractionation/enrichment techniques have been devel-oped along the way and coupled to LC-MS to increase cov-erage of the biofluid proteomes (13).
Human blood serum/plasma remains the most commonlyused clinical sample to date for proteomics applications be-cause it may include specific biomarkers for virtually all hu-man diseases due to its either direct or indirect interactionwith the entire cell complement of the body, i.e. tissue-spe-cific proteins may be released into the blood stream upon celldamage or cell death. Additionally serum/plasma can bereadily obtained by clinical sampling. However, the magnitudeof the previously mentioned challenges associated with hu-man clinical samples coupled with the anticipation that po-tential biomarkers of interest could be present at extremelylow concentrations in plasma has raised doubts as to whetherdisease biomarkers can be accurately detected or identifiedfrom plasma using a proteomics approach. As a result, anal-ysis of various other biofluids/tissues has gained increasingattention. Due to their proximity to the source of disease orperturbation in the body, tissues (14) and various biofluids suchas cerebrospinal fluid (15), bronchoalveolar lavage fluid (16),synovial fluid (17), nipple aspirate fluid (18), saliva (19), and urine(20) are believed to provide a more focused pool of potentialbiomarkers of interest. In addition, tumor interstitial fluids havealso been reported as a novel source for proteomics biomarker
and therapeutic target discovery (21), offering a promising al-ternative to direct tissue analysis. In the following review, wehighlight LC-MS-based proteomics profiling for clinical applica-tions by summarizing recent advances as well as the majorchallenges facing this technology for more effective candidatebiomarker discovery.
CHALLENGES AND REQUIREMENTS FOR DESIGNING A ROBUSTLC-MS DISCOVERY PLATFORM
The distinctive nature of human biofluid proteomes, in par-ticular the serum/plasma proteome, presents significant chal-lenges for current analytical technologies aimed at quantita-tive protein profiling and biomarker discovery. First, theserum/plasma protein content is dominated by several veryabundant proteins (i.e. the 22 most abundant proteins represent�99% of the total protein mass in plasma) yet at the same timepresents an extraordinary dynamic range (�10 orders of mag-nitude) in protein concentrations that begins with serum albuminat �45 mg/ml and extends to cytokines (and potentially manydisease-related proteins) at around 1–10 pg/ml or lower (5).Second, the serum/plasma proteome presents tremendous bi-ological complexity as a result of tissue “leakage” proteins fromthe entire body, complex post-translational protein modifica-tions such as glycosylation, and the existence of various forms(i.e. splice variants, proteolytic products, and the tremendousvariability in the immunoglobulin class) for each expressedgene. Finally the substantial genetic and non-genetic biologicalvariability of human clinical samples contributes significantly tothe overall analytical challenge.
Despite significant recent advances, major challenges re-main to prevent routine implementation of an LC-MS proteinprofiling platform suitable for efficient biomarker discovery(Table I). To effectively address these challenges, a proteinprofiling platform suitable for biomarker discovery and clinicalapplications must provide at the very minimum 1) overall highdynamic range of measurements and extensive coverage ofthe proteome for effective detection of low abundance pro-teins, 2) highly confident and specific protein identifications,
TABLE IChallenges and limitations of current LC-MS-based proteomics technologies applied to biomarker discovery
Challenge Current techniques for addressing the challenge Limitations
Dynamic range ofmeasurements
Immunoaffinity depletion and multidimensionalfractionation coupled with high resolution LC-MSor MS/MS instrumentation
Low throughput, requires relatively largesample sizes
Sensitivity Small inner diameter LC column (50 �m or less)coupled with nanoflow electrospray ionization andadvanced MS instrumentation (i.e. FTICR, LTQ-FT)
Issues in robustness and expense
Reproducibility andquantitation
Platform automation (including sample processing),label-free direct quantitation, and isotope labeling-based quantitation
Variations from multistep sampleprocessing, ionization suppression andinstrument variations, labeling efficiencies
Throughput Automated fast LC and gas phase ion mobilityseparations
Limited dynamic range or coverage
False positive identifications Improved database searching algorithms andstatistical models
Lack of consensus
LC-MS-based Clinical Proteomics
1728 Molecular & Cellular Proteomics 5.10
3) accurate quantitation of relative protein abundances acrossmany clinical samples, and 4) high throughput capable ofanalyzing large numbers of clinical samples to provide suffi-cient statistical power needed to address biological variability.In addition, the platform, including both sample processingand LC-MS instrumentation, must be robust and include ef-ficient informatics software capabilities for data mining andstatistical analyses. Currently there is a broad consensus thatno existing platform meets all of these requirements for effec-tive biomarker discovery.
Fig. 1 shows a component-based diagram of an LC-MSprotein profiling platform. Note that such a platform is notbased on a single instrument but rather on a compilation ofcurrent technologies to achieve high dynamic range quanti-tative proteome profiling for clinical samples. A key perform-ance factor of any such platform is the overall dynamic rangeof detection and extent of proteome coverage, which in turndictates its ability to detect low abundance proteins. Manydisease-specific proteins in plasma/serum are anticipated tobe present at very low levels (ng/ml or even lower), e.g. withinthe same range as current FDA-approved markers such asprostate-specific antigen (0.01–100 ng/ml) and Troponin-T(0.02–100 ng/ml). This is particularly obvious for cancer mark-ers of early detection where tumor size is very small (millime-ter size), and cancer-specific proteins in plasma may presentat pg/ml or lower levels. This overall dynamic range presentsa tremendous challenge for any MS-based technology. Theachievable dynamic range or proteome coverage for a plat-form depends on the peak capacity (the number of chromato-graphic peaks that can be fit into the length of separation) ofthe on-line LC separations prior to MS measurements, thedynamic range of the MS instrumentation, and the efficiencyof sample enrichment or fractionation steps at both proteinand peptide levels prior to LC-MS analyses. Analysis through-put inevitably determines the size of any clinical study sampleset and largely depends on factors such as automation ofeach platform component, LC-MS analysis duty cycle, andthe extent of prefractionation prior to LC-MS analysis. Al-though the application of more extensive fractionation canlead to a higher dynamic range of detection, the overall
throughput can be severely reduced. Other key performancefactors are the confidence of protein identifications and thequantitative accuracy, which determine the ability of the plat-form to confidently identify a potential biomarker based on theabundance differences between healthy and diseased condi-tions. Both the reproducibility of sample processing/fraction-ation prior to LC-MS and the LC-MS instrumentation willcontribute to the accuracy of quantitation.
ADVANCES IN LC-MS TECHNOLOGIES
A high resolution LC (or LC/LC) separation coupled on linewith MS is the central component of many proteomics plat-forms. Over the past decade, there have been significantadvances in LC separations as well as in MS instrumentationand ESI. To date, the “bottom-up” proteomics strategy thatcombines high efficiency separations with MS to characterizehighly complex peptide mixtures still accounts for the majorityof proteomics measurements. This strategy relies on the iden-tification of peptides sufficiently unique for protein identifica-tion. Protein mixtures from cellular lysates or biofluids aretypically digested by trypsin (or other proteases) into polypep-tides, which are then separated by capillary LC and analyzedby MS on line via an ESI interface. Peptide sequences areidentified by using automated database searching algorithmssuch as SEQUEST (22), MASCOT (23), or X!Tandem (24) tocorrelate experimental MS/MS spectra to theoretical massspectra based on sequences in a given protein database for aspecific organism. With the recent development of high speed2D linear ion trap instruments, i.e. LTQ, the protein profilingcoverage has been greatly enhanced compared with tradi-tional three-dimensional ion trap systems (25). When coupledwith SCX fractionation either on line or off line (26, 27), LC-MS/MS technologies now routinely allow for identification ofthousands of proteins from complex mammalian tissues andcells. Although routinely used for peptide/protein identifications,data-dependent LC-MS/MS still has an inherent “undersam-pling” limitation whereby only a portion of the species observedin the survey MS scan is selected for fragmentation (28).
To overcome the undersampling issue, our laboratory de-veloped an accurate mass and time (AMT) tag approach that
FIG. 1. A component diagram of an LC-MS protein profiling platform. FFE, free flow electrophoresis; 1D, one-dimensional; iTRAQ,isobaric tags for relative and absolute quantitation.
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1729
utilizes highly accurate mass measurements from a high res-olution mass spectrometer (e.g. FTICR or TOF mass spec-trometer) in conjunction with accurate elution time measure-ments from high resolution capillary LC separations toachieve high throughput proteome profiling without routineMS/MS measurements (29, 30). The concept of this AMT tagapproach is based on the principle that the accurate massand time measurements will allow reliable peptide identifica-tions by correlating the mass and time of detected peaks to apre-established peptide AMT tag reference library for a par-ticular biological system (e.g. plasma). With this approach,LC-MS/MS proteome analyses coupled with extensive frac-tionation only need to be performed once to create an effec-tive reference database of peptide markers defined by accu-rate masses and elution times, i.e. AMT tags. The AMT tagdatabase then serves as a comprehensive “look-up table” forsubsequent higher throughput LC-MS analyses, allowingmany peptides in each spectrum to be identified withoutMS/MS. Fig. 2 exemplifies an LC chromatogram and 2D dis-play of �2,800 peptides identified using the AMT tag strategyresulting from a single LC-FTICR analysis of a Pro-teomeLabTM IgY-12 depleted human plasma sample.
The fact that application of the AMT tag approach obviatesthe need for routine MS/MS is particularly attractive in highthroughput repeated analyses of similar samples (e.g. serum/plasma) in clinical proteomics studies. We have recently dem-onstrated the application of the AMT tag approach coupledwith 18O labeling for quantitative profiling of the humanplasma proteome in response to lipopolysaccharide adminis-tration (31). The availability of commercial high performancemass spectrometers (e.g. ThermoElectron Finnigan LTQ-FTand LTQ-Orbitrap) will likely lead to an even broader range ofapplications based on this LC-MS-only approach for higherthroughput peptide identifications.
As mentioned previously, the achievable dynamic rangefor the LC-MS platform depends significantly on the peakcapacity of the on-line gradient reversed phase separations,the dynamic range of the MS system, and the efficiency andstability of the ESI interface. A single MS spectrum canprovide a dynamic range of up to 103 for a high resolutioninstrument (e.g. FTICR), and one would expect to achievea dynamic range of at least 105 by coupling this instrumentto an on-line high resolution LC separation that provides apeak capacity of �1,000. However, the observed dynamicrange of measurements can be significantly reduced forcomplex biological samples such as human plasma due tothe charge competition of co-eluting high abundancespecies, leading to ion suppression of the relatively lowabundance species. Ion suppression is a particular issuewhen analyzing human biofluid samples as these samplesare dominated by a handful of highly abundant proteins.Significant ion suppression will occur when peptides origi-nating from low abundance proteins of interest co-elute withpeptides originating from high abundance proteins, leadingto the inability to detect the co-eluting low abundancepeptides.
Table II provides a summary of the relative proteome cov-erage and estimated dynamic ranges achieved by couplinghigh resolution reversed phase capillary LC separations witheither MS/MS using an LTQ instrument or MS using a 9.4-tesla FTICR instrument. The enhanced coverage and dynamicranges obtained by the removal of high abundance proteinsand SCX fractionation are illustrated. All results shown inTable II are based on triplicate experiments that involved apooled plasma sample from healthy subjects. The number ofpeptide identifications are reported with �95% confidencebased on either a reversed database evaluation for MS/MSdata (32) or a shifted database evaluation for the LC-FTICR
FIG. 2. A typical LC-FTICR analysis of an IgY-12 depleted human plasma sample. A, the base peak chromatogram. B, a 2D display of�2,800 identified species at the mass and NET space. The analysis was performed using a Bruker 9.4-tesla FTICR instrument coupled withan LC system equipped with a 150-�m-inner diameter and 65-cm-long capillary column operated at 5,000 p.s.i.
LC-MS-based Clinical Proteomics
1730 Molecular & Cellular Proteomics 5.10
data2 with all proteins identified using a minimum of twodifferent peptides. As shown, the single LC-MS/MS analysisonly identifies �100 proteins with high confidence and pro-vides a dynamic range of �103. With the removal of either thetop six (MARS) or top 12 (IgY-12) abundant proteins, theoverall dynamic range is enhanced to �105. LC-FTICR showsgreater coverage for both peptide and protein identificationscompared with LC-MS/MS, and the dynamic range is esti-mated to be similar to that observed for LC-MS/MS. (It shouldbe noted that presently unassigned peptides probably includemany more proteins.) When IgY-12 depletion and SCX frac-tionation are combined with LC-MS/MS, a dynamic range of106–107 can be achieved, allowing identification of nearly 500proteins in plasma with high confidence including many at thelow ng/ml level, and 2D LC-FTICR analyses would be ex-pected to increase this by approximately another order ofmagnitude. Note, however, that this dynamic range still falls 3orders of magnitude short for detecting pg/ml protein con-centrations. In addition, it should be noted that not all theproteins within the estimated dynamic range will be detecteddue to the differences in digestion efficiency and ion suppres-
sion effects for different proteins/peptides within the complexsample.
One key area of recent advances in LC-MS technologies isthe improvement associated with capillary LC instrumentationthat provides enhanced peak capacities and dynamic rangeof detection needed to analyze clinical samples. These im-provements have been achieved primarily through the use ofvery high pressure (10–20 kp.s.i.), very small porous particles(3 �m or less), smaller inner diameter columns (50-�m innerdiameter or less), nanoelectrospray interfaces, and relativelylong columns and long gradients for separations (33–35). Forexample, high efficiency separations with peak capacities of�1,000 have been achieved by using 15–75-�m-inner diam-eter and 85-cm-long capillary columns packed with 3-�mC18-bonded silica particles operated at 10 kp.s.i. By usingsmaller inner diameter columns (e.g. 15 �m) (34), the sensi-tivity of the system continues to increase inversely as themobile phase flow rates drop to as low as 20 nl/min, demon-strating the advantages of ESI-MS analyses at very low liquidflow rates (36, 37). More recently, the use of 20 kp.s.i. capillaryLC columns packed with 1.4–3-�m porous C18-bonded silicaparticles has been demonstrated to provide chromatographicpeak capacities of 1,000–1,500 for complex peptide and me-tabolite mixtures (35). Although these very high pressure sys-tems present technical challenges for robust automated op-
2 V. A. Petyuk, W. J. Qian, M. H. Chin, H. Wang, E. A. Livesay,M. E. Monroe, J. N. Adkins, N. Jaitly, D. J. Anderson, D. G. Camp,D. J. Smith, and R. D. Smith, manuscript submitted.
TABLE IIThe proteome coverage and estimated dynamic range offered by current LC-MS technologies
A pooled reference plasma sample from healthy individuals was used for this evaluation. A prepacked 4.6 � 50-mm (loading capacity, 15�l of plasma) MARS affinity column (Agilent, Palo Alto, CA) and a 7 � 52-mm (loading capacity, 25 �l of plasma) ProteomeLab IgY-12 affinitycolumn (Beckman Coulter, Fullerton, CA) were used for the depletion of high abundance proteins. For each method, the samples wereprocessed in triplicate and individually analyzed using a 150-�m-inner diameter and 65-cm-long column coupled with either a Finnigan LTQsystem (MS/MS) or a Bruker 9.4-tesla FTICR instrument. 10 and 5 �g of peptide samples were loaded for each LC-MS/MS and LC-FTICRanalyses, respectively. 300 �g of peptides were used for each SCX fractionation. The LC and SCX operations were the same as describedpreviously (31). Peptides were filtered with a confidence level �95% based on reversed database evaluation (32), and proteins were identifiedwith at least two different peptides. ALS, acid-labile subunit; vWF, von Willebrand factor; SAA, serum amyloid A; CRP, C-reactive protein;HGFA, hepatocyte growth factor activator; MSF, megakaryocyte-stimulating factor; EGFR, epidermal growth factor receptor; APOC2,apolipoprotein C-II; B2M, �2-microglobulin; NAP1L1, nucleosome assembly protein 1-like1; MMP2, matrix metallopeptidase 2; 1D, one-dimensional. We note that more relaxed indentification criteria would considerably expand the numbers of peptides and proteins identified byall approaches.
MethodsReplicate
Overlap Identified low abundance proteinsEstimated
dynamic rangeof coverage1 2 3
Non-depleted plasma and 1D LC-MS/MSPeptides 1,398 1,213 1,466 972 ALS, 25 �g/ml; Factor XII, 30 �g/ml;
APOC2, 35 �g/mlProteins 99 97 102 96 �103
MARS depletion and 1D LC-MS/MSPeptides 1,723 1,732 1,692 1,250 B2M, 1.1 �g/ml; vWF, 1.3 �g/ml; SAA,
10 �g/mlProteins 119 118 115 111 �104
IgY-12 depletion and 1D LC-MS/MSPeptides 1,869 1,912 1,999 1,309 Myoglobin, 90 ng/ml; CRP, 500 ng/ml;
HGFA, 500 ng/ml; CD14, 1.4 �g/mlProteins 130 141 130 122 �105
IgY-12 depletion and 1D LC-FTICRPeptides 2,800 2,840 2,630 2,070 Myoglobin, 90 ng/ml; CRP, 500 ng/ml;
HGFA, 500 ng/ml; CD14, 1.4 �g/mlProteins 174 172 167 162 �105
IgY-12 depletion and SCX-LC-MS/MSPeptides 5,196 6,148 5,687 3,391 MSF, 1 ng/ml; Leptin, 5 ng/ml; NAP1L1,
7 ng/ml; MMP2, 9 ng/ml; Cathepsin D,9 ng/ml; EGFR, 11 ng/ml
Proteins 498 474 476 369 �106–107
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1731
erations, the recently commercialized Waters nanoACQUITYUPLC System that takes advantage of 1.7-�m sized particlesand operates at �10 kp.s.i. demonstrates the feasibility ofsuch high performance systems for routine applications. Withfurther improvements in robustness, these “ultraperformance”systems may become a powerful component for separatingcomplex mixtures such as human biofluids while concurrentlyproviding the high dynamic range needed for candidatebiomarker discovery applications.
MULTIDIMENSIONAL FRACTIONATION STRATEGIES COUPLED WITHLC-MS FOR IMPROVED PROTEOME COVERAGE
Given the tremendous dynamic range of protein abun-dances and the extraordinary complexity of human biofluidproteomes, many different fractionation techniques havebeen developed and applied in a multidimensional fashion toenhance dynamic range of detection and improve proteomecoverage (13). Multicomponent immunoaffinity removal ofhighly abundant proteins in human plasma/serum (38, 39) hasincreasingly become the method of choice for prefractionat-ing human plasma samples due to the high specificity, effi-cacy, and ease of coupling to other fractionation techniques.As shown in Table II, coupling the immunoaffinity depletionstep to LC-MS provides an additional 1–2 orders of magni-tude increase in dynamic range, allowing for detection ofmore low abundance proteins by effectively increasing thesample loading; similar improvements were reported in otherstudies (40, 41). Good reproducibility was demonstrated byperforming immunoaffinity depletion with an automated LCsystem; however, some of the nontarget low abundance pro-teins have also been observed to bind to the columns but in areproducible fashion (42). A possible approach to counter thiseffect is to analyze both the flow-through and bound fractionsin more of a “partitioning” method instead of a pure “deple-tion” approach (39) with the accompanying trade-off of anincreased number of required analyses. A further enhance-ment to the platform dynamic range will stem from the con-tinuous improvement of antibody-based microbead technol-ogies that will allow for removal of more highly to moderatelyabundant proteins.
Several different techniques for protein-level fractionationhave been applied to human plasma/serum proteome profil-ing, including common gel-based techniques (43, 44), PF2Dautomated chromatofocusing/reversed phase LC (RPLC) (45)and other liquid chromatography-based separations (46),free-flow electrophoresis (41, 47), and IEF (46, 48–51). IEF isa common fractionation technique that has been applied toplasma profiling at both peptide and protein levels. Variousforms of liquid phase IEF techniques have been developed,including off-gel electrophoresis (48), Rotofor (49) or Mini-Rotofor (46), microscale solution IEF (ZOOM) (50), and a pre-parative multichannel electrolyte system (51). A common fea-ture of these systems is the multiple tandem electrodechambers used to partition complex protein samples. IPG IEF
followed by in-gel digestion has also been used for plasmaprotein fractionation prior to LC-MS/MS (52). A number ofrecent large scale proteome profiling studies have combineddifferent protein- and peptide-level fractionation techniques(e.g. PF2D (45), SCX/RPLC (54), free flow electrophoresis-IEF/RPLC (47), ZOOM/SDS-PAGE (50), and Rotofor/RPLC/SDS-PAGE (49) protein fractionation) with peptide-level LC-MS/MSanalyses to achieve more comprehensive coverage of theplasma proteome.
An alternative to plasma protein fractionation is to specifi-cally enrich functional “subproteomes” such as the glycopro-teome or the cysteinyl subproteome by using chemical tag-ging or capture agents; this significantly reduces overallsample complexity and enhances detection of low abundanceproteins. For example, we have recently demonstrated a sim-ple procedure for effectively enriching cysteinyl peptides fromcomplex proteomes (including human biofluids (55)) that pro-vides significantly improved proteome coverage when usedas a peptide-level fractionation technique (27). Additionallyhydrazine chemistry can be applied to specifically enrich N-linked glycopeptides (56, 57), and multilectin affinity chroma-tography can be used to isolate and characterize glycopro-teins from human plasma and serum samples (58). Ourlaboratory has recently developed a strategy that combinesimmunoaffinity depletion and subsequent chemical fraction-ation based on cysteinyl peptide and N-glycoprotein captureswith 2D LC-MS/MS for in depth plasma profiling (Fig. 3) (59).Application of this “divide-and-conquer” strategy to traumapatient plasma samples resulted in confident identification of�1,500 different proteins (with a minimum of two peptides perprotein; �99.5% confidence level based on reversed data-base evaluation) and illustrated an overall dynamic range ofdetection of �107 (low ng/ml concentrations for six identifiedlow abundance proteins were verified by ELISA).
ANALYSIS THROUGHPUT
Although integration of extensive multidimensional fraction-ation/separations with MS greatly increases the overall pro-teomics analysis dynamic range and the extent of proteomecoverage, this general approach suffers from the limitation ofvery low throughput. To date, most reports involving exten-sive fractionation have been limited to small scale studies ofone or two pooled clinical samples rather than larger scalequantitative studies. The development of more effective de-pletion/fractionation strategies and improved LC-MS plat-forms will most likely reduce the total number of fractionsnecessary for the detection of low abundance and clinicallyrelevant proteins and thus provide higher throughput.
Several recent technology developments hold potential forgreatly enhancing the overall analysis throughput of clinicalsamples. The first is the development of very fast LC separa-tions for proteomics analyses. Current automated LC-MSproteomics platforms typically involve LC separations withgradients of 100 min or longer, which limits throughput to �10
LC-MS-based Clinical Proteomics
1732 Molecular & Cellular Proteomics 5.10
sample analyses per day per MS instrument. Several reportshave explored the use of smaller particle-packed columnsor monolithic columns for fast LC separations (10 min orless) as well as multiplex column systems to significantlyimprove the throughput (60, 61). However, it is unclearwhether sufficient separation power can be achieved withthese fast liquid phase separations because the increase inthe solvent gradient speed can degrade the separation peakcapacity (60), which in turn reduces the overall dynamicrange of detection. Other strategies for achieving robustfast separations include liquid phase chromatographic andelectrophoretic separations on a microfluidic chip platform(62–64). Such chip-based separation devices also have theadvantage of providing better robustness, reliability, andease of operation.
Very fast (millisecond scale) gas phase separations basedon ion mobility spectrometry (IMS; a separation method that issomewhat analogous to electrophoresis in the gas phase) areanother powerful alternative to liquid phase separations forsignificant improvement in throughput. At its simplest, an IMSstage consists of a drift tube filled with a non-reactive gas
(commonly helium or N2) and a uniform electric field estab-lished along the axis of separation. Mixtures of peptides,proteins, or small molecules are separated by their gas phasecross-sections (size) in addition to charge, and knowledge oftheir mobility provides another separation dimension to aid inidentification.
The power of IMS has been advanced by several recenttechnical developments. IMS coupled with a TOF MS platformand combinatorial libraries (65) has been recently demon-strated for analysis of proteolytic digests (66). Because anIMS separation typically requires 1–100 ms and has a resolv-ing power of 50–200, a single species IMS peak exits the drifttube over a �0.1–1-ms period. Generation of a typical TOFMS spectrum requires �30–100 �s, which allows multiplemass spectra to be obtained during the “elution” of an IMSpeak. More recently, LC has been coupled to IMS-TOF MS viaan ESI interface, providing 2D separations prior to MS anal-ysis (67). Despite enormous potential for high throughputanalyses of complex samples, the application of IMS-TOF MShas been limited by low sensitivity due to ion losses at theIMS-MS interface; however, the recent implementation of
FIG. 3. Schematic representation of a chemical fractionation strategy applied to the plasma proteome characterization. Highabundance proteins were first removed using immunoaffinity subtraction. The resulting less abundant proteins were split and subjected to solidphase cysteinyl peptide and N-glycoprotein captures independently. Non-cysteinyl peptides and non-glycopeptides generated at the sametime were also collected. All four different peptide populations were then fractionated by SCX, and each fraction was analyzed by capillaryLC-MS/MS. PNGase F, peptide-N-glycosidase F (59).
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1733
electrodynamic ion funnels at both the ESI-IMS and IMS-TOFMS interfaces has significantly improved the sensitivity of theoverall LC-ESI-IMS-TOF MS platform (Fig. 4) (68) such that thesensitivity is now comparable to that of a commercial ESI-MSinstrument. Although still in the development stage, the very fastseparation speed and potential high dynamic range of meas-urements offered by the 2D liquid phase-gas phase separationsmake LC-ESI-IMS-TOF MS an attractive and practical platformfor high throughput clinical applications.
CONFIDENCE OF PEPTIDE/PROTEIN IDENTIFICATIONS
One of the challenges associated with MS/MS-based pro-teome profiling is how to assess the confidence levels ofpeptide and protein identifications that result from automateddatabase searching. It is recognized that a significant portionof the protein identifications in previously published proteom-ics datasets of human plasma are likely comprised of falsepositive identifications (32, 69–71). For example, four differentplasma proteomics datasets that originated from differentmethodologies were combined into a list that included 1,175non-redundant proteins; however, only 46 of these non-re-dundant proteins (�4%) were observed across all four studies(70). This surprisingly low overlap suggests the potential for avery large number of false protein identifications. In a plasmaprofiling study using nanoscale LC-MS/MS, Shen et al. (69)reported a nearly 2-fold difference in the number of identifiedproteins (ranging from 800 to 1,600) depending on which setof previously published criteria were used to filter the data.This criteria-dependent difference illustrates the need formore detailed statistical evaluations to ensure high confi-dence protein identifications.
To address the issue of false peptide identifications, werecently performed a probability-based evaluation of peptideidentifications derived from LC-MS/MS and SEQUEST anal-
ysis in which selected human proteomes, including humanplasma, were searched against a sequence-reversed humanprotein database (32) similar to a previous report applying thereversed database strategy to the yeast proteome (72). Thereversed protein database was created by reversing the orderof amino acid sequences for each protein (the carboxyl ter-minus becomes the amino terminus and vice versa) in theoriginal human protein database. This approach assumes thatthe numbers of false positives that arise from “random” hitsshould be the same for both the normal database and thereversed database because the reversed database is identicalin number of protein entries, protein size, and distribution ofamino acids to the normal database. Fig. 5 shows a histogramof Xcorr distribution for unique peptides (charge state 2�;
FIG. 5. Relative frequency of different peptides identified fromthe normal human protein database (solid line) and the reversedhuman protein database (dashed line) at different Xcorr values.Data shown are for the 2� charge state fully tryptic peptides identifiedfrom human plasma and filtered with �Cn � 0.1. Reproduced withpermission from Ref. 32, copyright 2005 Am. Chem. Soc.
FIG. 4. Schematic diagram of a prototype ESI-IMS-Q-TOF instrumentation platform that uses electrodynamic ion funnel interfacesat both ends of the IMS drift tube and, as a result, provides very high sensitivity from high speed analyses. Reproduced with permissionfrom Ref. 68, copyright 2005 Am. Chem. Soc.
LC-MS-based Clinical Proteomics
1734 Molecular & Cellular Proteomics 5.10
fully tryptic) from a human plasma sample identified bysearching the normal (solid line) and reversed (dashed line)databases. The Xcorr distribution allows an estimated confi-dence level for any given Xcorr bin as well as the overall falsepositive rate for a given Xcorr cutoff to be calculated bydividing the area beneath the dashed line (reversed databasehits) by the area beneath the solid line (normal database hits)for a given Xcorr range. This study also revealed the high falsepositive rates for plasma/serum peptide/protein identifica-tions in several previously published studies (10, 69, 70, 73,74). For example, �30% false positives were observed whenthe often cited Washburn et al. (75) filtering criteria wereapplied to human plasma. Thus, filtering criteria that providedoverall �95% confidence at the unique peptide level for bothhuman cell lines and human plasma were proposed. Whenidentical filtering criteria were used, the observed false posi-tive rates of peptide identifications for human plasma weresignificantly higher than those for the human cell lines, sug-gesting that the false positive rates are significantly depend-ent upon sample characteristics, particularly the number ofproteins found within the detectable dynamic range for differ-ent samples. Additionally Xie and Griffin (76) reported theincreased potential for false positive identifications for the 2Dlinear ion trap (LTQ) when compared with a traditional three-dimensional ion trap (LCQ) instrument, and more stringentfiltering criteria are required for LTQ compared with LCQ tominimize false positive identifications. These results suggestthat peptide/protein identification confidence levels not onlydepend on sample characteristics but also on components ofthe LC-MS platform.
Table III illustrates differences in filtering criteria stringencyby comparing peptide/protein identification results from thesame plasma MS/MS dataset (obtained from a recent profilingstudy using trauma patient plasma samples (59)) that wasfiltered using three different sets of criteria (77, 78). As shown,
the reversed database filtering criteria generated the smallestnumber of peptide and protein identifications, consistent withthe significantly lower percentage of false positive identifica-tions (�4%), whereas the Human Proteome Organization(HUPO) plasma proteome project-recommended criteria (77)and the criteria recently reported by Hood et al. (78). gener-ated nearly �25 and �66% false positives at the peptidelevel, respectively. The comparison shows that the number ofpeptide/protein identifications from an individual protein pro-filing study could be easily inflated if a statistical evaluation offalse positives was not performed.
A similar observation was recently reported for proteinsidentified from data acquired on different instruments from18 laboratories as part of the large scale HUPO plasmaproteome collaborative study (77). Application of a rigorousstatistical approach that used multiple hypothesis-testingtechniques and took into account the length of codingregions in genes reduced the initial list of 9,504 proteins (ofwhich 3,020 were identified with two or more peptides) to889 proteins (containing both multipeptide and single pep-tide protein identifications) identified with a confidence levelof at least 95% (71). Interestingly this length-dependentstatistical approach was applied to reanalyze one of ourpreviously published datasets (69) and resulted in 1,073proteins using the HUPO criteria and 433 proteins using the�95% confidence length-dependent statistics (71). Similarly a�2-fold difference in protein identifications between the re-versed database filtering results and the HUPO criteria (TableIII) was observed, suggesting similar performance betweenthe length-dependent statistical approach and reversed da-tabase filtering with �95% confidence.
PeptideProphet provides another independent statisticalmodel for evaluating potential false positive peptide identifi-cations. The model utilizes the expectation maximum algo-rithm to derive a mixture of correct and incorrect peptide
TABLE IIIComparison of peptide and protein identifications from a plasma proteome profiling dataset analyzed using different criteria (59)
Filtering criteria Difference in stringencyPeptidesidentified
Proteinsidentifieda
Multipeptideproteins
Averagepeptides
per protein
Estimatedfalse positive
rateb
%
Reversed database (32) �95% confidence at the unique peptide levelbased on statistical evaluation. Only fullyand partially tryptic peptides areconsidered.
22,267 3,654 1,494 (40.9%) 6.1 �4
HUPO Plasma ProteomeProject (77)
Inclusion of partially tryptic peptides withrelatively low cutoffs.
30,524 7,928 2,850 (35.9%) 3.9 �25
Hood et al. (78) Inclusion of partially tryptic and otherenzymatically cleaved peptides as well aspeptides without protease constraints withrelatively low cutoffs.
66,839 18,958 11,653 (61.5%) 3.5 �66
a Non-redundant protein identifications generated by Protein Prophet (80).b False positive rate for each filtering criteria was calculated at unique peptide level based on reversed database evaluation (32). The reversed
protein database was created by reversing the order of amino acid sequences for each protein (the carboxyl terminus becomes the aminoterminus and vice versa) in the original protein database.
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1735
assignments from the data (79). This approach has beendirectly compared with the reversed database approach foranalyzing the same dataset derived from human plasma (59).Following filtering with reversed database criteria, 6,279unique peptides were identified from this dataset with �95%confidence, whereas 6,341 unique peptides were identified byPeptideProphet using a minimum computed probability of0.95. Approximately 95% of peptides were common betweenthe two datasets, suggesting comparable results from thesetwo statistical approaches. The use of ProteinProphet, an-other statistical model that computes the probability of thepresence of proteins, addresses the issue of whether pep-tides are present in more than one entry in the protein data-base (protein redundancy problem) (80). The list of identifiedpeptides from both the PeptideProphet and the reversed da-tabase filtering approaches can serve as input for Protein-Prophet to generate a list of non-redundant protein identifi-cations. Several other statistical methods have been recentlydescribed for evaluating peptide assignments from MS/MSspectra (81–83). Ideally universal acceptance of a statisticalmodel that optimizes both sensitivity and specificity for con-fident peptide identifications from MS/MS spectra will allowcross-comparison of protein profiling results from differentlaboratories, which currently remains as an unresolvedchallenge.
Similar challenges exist for evaluating false positive identi-fications from MS-only approaches that utilize accurate mass
measurements for peptide/protein identifications. The utilityof accurate mass measurements initially was demonstrated inthe “peptide mass fingerprinting” approach for protein iden-tification in which a set of peptide fragments unique to eachprotein are created by digestion, and the mass of these pep-tide fragments is used as a “fingerprint” to identify the originalprotein (84–86). Thus far, this approach has been limited tosimple protein mixtures or single proteins. The more recentlyreported AMT tag approach utilizes accurate LC retentiontime measurements in addition to accurate mass measure-ments to identify peptides and has been successfully appliedto global proteome profiling, including the human plasmaproteome (31, 87). With the AMT tag approach, peptides areidentified by matching LC-MS observed mass and normalizedelution time (NET) features to AMT tags in the pre-establishedreference database (look-up table of peptides) with a givenmass error and NET error tolerances (typically 1–5 ppm formass and 1–3% for NET). The potential false positive identi-fications resulting from random matching of features to thereference database are indicated on histograms of mass error(the difference between observed mass and calculated massfor the matched peptide in the database) exemplified in Fig.6A for a human plasma dataset analyzed by LC-FTICR. Notethat the use of the NET constraint significantly reduces thelevel of random matches as indicated by the background levelfor each histogram. Similar to the reversed database ap-proach for MS/MS, we have recently applied a shifted data-
FIG. 6. A, mass error histograms of features detected from a single LC-FTICR dataset of a human plasma sample that matched to a humanplasma AMT tag database using different levels of NET constraints. The LC separation time is normalized to a 0–1 scale in NET. B, mass errorhistograms for features from the same dataset matching to a normal AMT tag database (gray circles) and to a shifted AMT tag database (blacksquares). Note, the black squares represent random matches to the 11 Da shifted AMT tag database.
LC-MS-based Clinical Proteomics
1736 Molecular & Cellular Proteomics 5.10
base approach for evaluating the false positive rate in theAMT tag process.2 As shown in Fig. 6B, an �3% false positiverate for this human plasma dataset was estimated as the ratioof the area beneath the curve that represents matches to theshifted database (black squares) and the area beneath thecurve that represents matches to the normal database withina �2 ppm window (gray circles). In addition to being used fordirect identification in the MS-only approach, the accuratemass information also has been utilized for improving theconfidence of peptide identifications by MS/MS through ap-plication of the new generation of LTQ-FT and LTQ-Orbitrapmass spectrometers (88, 89).
QUANTITATION STRATEGIES
The ability to quantitatively measure relative protein abun-dance differences between different clinical samples is essen-tial for identifying candidate protein biomarkers; however, thevast majority of proteomics work related to biomarker discov-ery published to date has been qualitative, highlighting theneed for more robust quantitative approaches for such appli-cations. Our initial application for comparative proteome anal-ysis of human plasma following lipopolysaccharide (LPS) ad-ministration involved a semiquantitative strategy based on thetotal number of peptide identifications per protein (peptidehits or spectrum count) (74). In this study, standard SCX-LC-MS/MS analysis was performed at the 0-h time point (control)and a 9-h time point following LPS administration, and pep-tide hits were used to obtain a relative quantitative measurebetween the control and 9-h time point. Several known in-flammatory response and acute phase proteins were ob-
served to be up-regulated upon LPS administration. Severalother studies have shown that this peptide hits approach can beused as a semiquantitative approach for initial screening whenapplied with proper controls and with adequate thresholds(90–93).
More recently, we have demonstrated 16O/18O labelingcombined with the AMT tag strategy as an effective globalquantitative approach for quantifying relative protein abun-dance differences in human plasma (31). By incubating trypticpeptides in 18O water (55, 94) in the presence of trypsin, the18O atoms are incorporated into the carboxyl terminus oftryptically cleaved peptides via a postdigestion trypsin-cata-lyzed oxygen exchange reaction. The 16O/18O-labeled pep-tide pairs provide a 4-Da mass difference (Fig. 7A), whichallows a high resolution mass spectrometer such as FTICR orTOF to effectively resolve the 16O- and 18O-labeled peptidepairs and accurately measure the relative abundances. Theadvantage is that all types of samples (e.g. tissues, cells, andbiological fluids) can be effectively labeled using this simpleand specific enzyme-catalyzed reaction. Fig. 7A shows apartial 2D display of detected peptide pairs in mass versustime dimensions. The 18O/16O-labeled peptides are readilyvisualized as co-eluting pairs (4 Da apart), and the abundanceratio can be precisely calculated for each 18O/16O pair. In thisinitial comparative analysis demonstration of two humanplasma samples obtained from a healthy individual prior to(control) and following LPS administration, relative abundancedifferences between the two plasma samples were quantifiedfor a total of 429 plasma proteins. Fig. 7B shows the normal-ized -fold changes in 429 quantified proteins and demon-
FIG. 7. A, a partial 2D display of the detected 18O/16O-labeled peptide pairs from an LC-FTICR analysis. The elution time is shown as anormalized scale between 0 and 1. Observed peaks (represented by spots) correspond to various eluting peptides. The heavy and lightisotope-labeled pairs are easily visualized with a 4-Da mass difference. B, normalized -fold changes for the 429 quantified proteins followingLPS administration. The abundance ratio for each protein shown was normalized to zero (R � 1) (53). For ratios smaller than 1, normalizedinverted ratios were calculated as 1 � (1/R). The error bar for each protein indicates the S.D. for the abundance ratios from multiple peptides.Proteins without error bars were identified with single peptides.
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1737
strates the significant changes in abundance for a set ofproteins following LPS administration. The combined 16O/18Olabeling-AMT tag strategy can also be easily coupled withsubsequent peptide-level fractionation approaches such ascysteinyl peptide enrichment (55) and SCX fractionation.
Other stable isotope labeling methods based on relativepeptide/protein abundance measurements include metaboliclabeling (95–97) and chemical labeling of specific functionalgroups using reagents such as ICAT (98) and iTRAQ (isobarictags for relative and absolute quantitation) (99, 100) have beenroutinely used for quantitative proteomics analysis. In clinicalproteomics applications, these stable isotope labeling tech-niques are well suited for detecting accurate changes in pair-wise comparisons provided the samples can be effectivelylabeled; however, it is often challenging to compare across alarge number of clinical samples. One alternative to the use ofthese labeling techniques is the use of a labeled referencesample (often a pooled composite) that is spiked into eachnormally processed individual clinical sample that allows rel-ative quantitation between each clinical sample and the ref-erence sample and cross-comparison among the entire set ofclinical samples. The 18O labeling strategy is well suited forgenerating such a labeled reference sample as all other clin-ical samples can be processed with natural 16O on the car-boxyl termini without labeling; 16O/18O peptide pairs areformed after spiking the samples with the 18O-labeledreference.
Alternatively “label-free” direct quantitation approacheshold interest because of greater flexibility for comparativeanalyses and simpler sample processing procedures com-pared with labeling approaches. The isotope labeling andlabel-free approaches are complementary, and each ap-proach has different sources of variations. Several initial stud-ies suggest that the use of normalized LC-MS peak intensitiesfor detected peptides can be used to compare relative abun-dances between similar complex samples (101–103). It hasbeen demonstrated that abundance ratios of separate modelproteins may be predicted to within �20% in complex pro-teome digests by using measured peptide ion intensities ob-tained in LC-MS analyses (101). Among the main challengesfor label-free quantitation are the multiple issues that affectthe usefulness of peptide peak intensities for relative quanti-tation, such as differences in electrospray ionization efficien-cies among different peptides and different samples (37),differences in the amount of sample injected in each analysis,and sample preparation reproducibility. These issues are of-ten peptide-dependent, leading to observed disparity amongrelative abundances of different peptides originating from thesame protein. The significant bias and ion suppression effectscaused by charge competition (ionization bias) during ESI(104) are often considered a major limitation for accuratelabel-free quantitation. Recent studies have demonstratedsubstantial advantages for ESI-MS analyses at nanoflow re-gimes (�100 nl/min) afforded by narrower inner diameter
capillary columns for separations (36, 37). It is well demon-strated that smaller inner diameter columns with lower flowrates provide significantly higher sensitivity than larger innerdiameter columns with higher flow rates (34) because of thesignificant improvements in both ionization and MS samplingefficiencies. Reversed phase packed nanoscale LC and mon-olithic nanoscale LC separations have been developed andcoupled to ESI for improved ionization and quantitation (34,105). As ionization efficiencies are increased for nanoelectro-spray, detection biases are decreased because undesiredmatrix effects and/or ion suppression effects are either re-duced or eliminated (104–106), providing the basis for im-proved quantitation. With further improvements to the ro-bustness of these nano-LC-ESI-MS systems, label-freequantitation may be widely applied in clinical applications.
Another challenge for quantitative clinical proteomics appli-cations is the variability introduced during multiple steps ofsample processing. With continued development of cleanupproducts for more consistent performance and automatedsample processing, such reproducibility issues may be mini-mized, leading to further improvements in quantitation whenapplying either the stable isotope labeling or label-freeapproaches.
IMPLICATIONS OF HUMAN HETEROGENEITY IN CLINICALPROTEOMICS STUDIES
The ability to identify disease-specific differences by usinga proteomics approach relies on multiple factors integral tothe overall analysis pipeline. For example, when performingpeptide-level measurements, achieving high peptide identifi-cation quality is a prerequisite for assuring confidence in allother downstream parameters (i.e. confidence in both proteinidentification and quantitation), whereas the ability to quantifydifferences between any two samples largely depends on thereproducibility of the overall platform. Due to inherent varia-tions that stem from sample preparation and instrument anal-ysis, technical replicates are often performed to evaluate andminimize technical variability arising from the overall analysispipeline. Technical variability will be minimized as technolo-gies continue to mature, and platforms will likely becomemore robust and reproducible; however, biological variabilitywithin the same comparative groups remains as a challengefor identifying real differences between different conditions.Although ideally one would like to either control or minimizesuch biological variability by utilizing more controlled modelsystems such as cell cultures, an in vitro model system, oreven inbred mouse strains, this is not always possible. Mostclinical studies are based on “real world” human clinical sam-ples where inherent human individual heterogeneity makesdiscovery efforts more difficult. The human heterogeneitychallenge in proteomics studies stems from the high proba-bility that two equally “healthy” individuals will have overallsignificantly different individual protein abundance levelswhen sampled at any given time. This heterogeneity can be
LC-MS-based Clinical Proteomics
1738 Molecular & Cellular Proteomics 5.10
due to individual genetic variability (i.e. gender, race, etc.)and/or to contributing environmental factors such as diet,overall health, detrimental environmental exposures, etc. Thecomplexity of human diseases presents another degree ofchallenge. For example, in human cancer, each tumor typetypically consists of a number of subtypes that differ withregard to their spectrum of genetic alterations (107). There-fore, a potential candidate biomarker of disease may be ele-vated only in a certain percentage of the pool of diseasepatients.
The implications of human heterogeneity in the context ofLC-MS-based proteomics experiments centers mostly on themeasured quantitative values for peptide/protein identifica-tions. Fig. 8 shows an initial evaluation of the technical vari-ation and biological variations of human and mouse plasmasamples based on the Pearson correlation of the identifiedpeptide intensities between any two individual samples. Thetechnical replicate results (Fig. 8A; nine individually processedsamples from one pooled reference plasma) show overallgood correlation (0.94 � 0.02), which suggests relatively goodreproducibility of the overall analytical platform. The increasedvariation among human subjects (Fig. 8B) appears obvious onthe basis of significantly reduced average correlation coeffi-cients (0.85 � 0.06) compared with the technical replicateresults; whereas mouse plasma samples (Fig. 8C) show onlyslightly reduced correlation (0.92 � 0.05), which suggestsrelatively small biological variation in these inbred mousemodels. Such large variations observed among different
healthy control subjects present a challenge for identifyingdisease-specific differences. To address these challengesand increase the confidence of discovery results, it is essen-tial for the discovery platform to be able to analyze a relativelylarge number of clinical samples in a high throughput mannerto obtain sufficient statistical power.
Other proteomics studies have also described the effects ofhuman heterogeneity in specific model systems. Hu et al. (15)performed a limited study that compared both intra- andinterindividual variability of human cerebrospinal fluid samplesobtained from six individuals. Specific proteins were observedto fluctuate over time with the same individual, but overallthere was a higher concordance of interindividual resultsthan across individuals. Interestingly results from measuringintraindividual protein levels suggested that certain proteinstended to fluctuate more than others, calling into questionthe effectiveness of using these proteins as potential dis-ease markers. Other studies include a report by Zhan andDesiderio (108) that showed the heterogeneity in 2D gelelectrophoresis human pituitary proteome analysis and aninteresting review by Mann et al. (109) that overviewed theeffects of genotypic and phenotypic variations in evalua-tions of the hemostatic proteome. They reported that “nor-mal” pro- and anticoagulant concentrations were observedto vary significantly and influence downstream responses,demonstrating how heterogeneity in individual phenotypesshould influence diagnosis and therapy for hemorrhagic andthrombotic diseases.
FIG. 8. Pearson correlation plot comparing peptide intensities of LC-FTICR analyses of plasma samples. A, nine technical replicatesfor a pooled reference human plasma sample from multiple healthy subjects. B, nine human plasma samples from individual healthy subjectswith ages range from 18 to 26. C, nine mouse plasma samples isolated from individual C57BL6 mice. Each sample including the technicalreplicate was separately processed by ProteomeLab IgY-12 (for human) or IgY-R7 (for mouse) depletion, and the flow-through portions weredigested with trypsin prior to LC-MS analyses.
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1739
Designing experiments to minimize biological variability isimperative for clinical studies. One example is to analyze aserial sample set, i.e. plasma or biopsy tissue samples, fromthe same individual over a time course or disease progres-sion; this in theory will alleviate a majority of heterogeneityeffects, but such samples are traditionally more difficult toobtain in addition to the fact that most patients do not have a“control” blood or tissue sample in storage for comparisonagainst a possible disease diagnosis. For most studies thatuse cross-sectional approaches, it is desirable to match thepatients and controls in terms of age, sex, race, weight, andeven diet if possible. A recent study reported the potentialutility of pooling for reducing the effects of biological variationin microarray studies while retaining the accuracy of identify-ing differentially expressed genes when biological replicatesare retained in the study design and providing the additionalbenefit of a great reduction in the total number of samples tobe analyzed (110). Such a strategy might be explored andextended to clinical proteomics studies.
A further implication in heterogeneity is the presence ofprotein isoforms, splice variants, specific amino acid muta-tions, proteolytic products, and other post-translational mod-ifications that are likely present in individual samples but aremost often not explicitly included as sequences in the search-able protein database. This exclusion makes it challenging fortraditional LC-MS/MS-based bottom-up approaches to iden-tify such modified proteins and is possibly one of the mainreasons that a large percentage of MS/MS spectra in clinicalanalyses remain unidentified. The identification of amino acid-specific post-translational modifications (e.g. phosphoryla-tion, glycosylation, glycation, nitration, oxidation, and deami-nation) challenges MS/MS-based approaches due to the vastvariety of possible modifications and the potential high falsepositive rates that originate from database searching. Be-cause it is recognized that many protein biomarkers may bespecific protein isoforms or modified proteins, further techni-cal developments for more effective identification and quan-titation of protein isoforms and modifications would be greatlydesirable.
As an alternative to identifying protein isoforms and mod-ifications, intact protein-level separations can be used toseparate different protein isoforms on the basis of theirdifferent masses or other properties. The ability to use 2Dgel electrophoresis for resolving different isoforms andmonitoring their abundance changes has been well docu-mented (111). The recently developed multidimensional in-tact protein analysis system (IPAS) separates intact proteinson the basis of charge, hydrophobicity, and molecularmass; quantitation is achieved by protein tagging with flu-orophores (43). The potential for revealing different proteinisoforms and specific protein cleavage products in humanplasma/serum also has been demonstrated (49). The advan-tages offered by intact protein analysis complements thebottom-up proteomics approaches, and better integration
of these two approaches may lead to more effective biomar-ker discovery.
TARGETED PROTEOMICS APPROACHES
The majority of proteomics applications in the search forcandidate biomarkers to date have been focused on globalproteome characterization focused on identifying multipleprotein differences (candidate biomarkers) that correlatewith specific human diseases; however, as discussed pre-viously, there are many challenges associated with applyingsuch a strategy to the discovery of low abundance candi-date marker proteins. An alternative strategy for biomarkerdiscovery that complements global profiling is the targetedproteomics approach that involves quantitative MS tomeasure a hypothesis-generated list of candidates (112).The targeted proteomics strategy often provides greatersensitivity and allows for detection of low abundance can-didate proteins. Anderson and Hunter (113) recently dem-onstrated the use of peptide multiple reaction monitoring(MRM) for quantitative assaying of major plasma proteins.Such MRM assays provide great specificity for peptide/protein identifications and relatively good precision forquantitation. Additionally MRM can provide a rapid andspecific platform for biomarker validation, particularly whencoupled with specific enrichment techniques such as therecently published SISCAPA (Stable Isotope Standards andCapture by Anti-Peptide Antibodies) method for enrichingtarget peptides using anti-peptide antibodies (114). Activity-based protein profiling is another strategy that uses chem-ical probes for tagging, enriching, and isolating a specificsubset of physiologically important proteins on the basis ofenzymatic activity (115, 116). Coupling such strategies withLC-MS holds potential for eliminating many issues relatedto the dynamic range of protein abundance.
A continuing issue for current LC-MS-based profiling ap-proaches is that many of the detected species or featuresfrom LC-MS and LC-MS/MS analyses remain unidentified.Based on our experience, �80% of MS/MS spectra onaverage are not confidently identified via database search-ing, and more than 50% of LC-FTICR-detected featuresremain unidentified by the AMT tag approach. Present in-formatics tools and statistical algorithms have been able toutilize intensity information of these unidentified features toidentify “interesting” features as potential biomarkers forspecific diseases; effectively targeting these interesting fea-tures using data-directed or targeted MS/MS approaches isof current interest. One of the informatics challenges asso-ciated with identifying these features concerns differentpost-translational modifications. Current commercial massspectrometers such as the LTQ offer a targeted MS/MScapability based on the selection of a list of m/z values.Developing an advanced targeted MS/MS approach (117)that incorporates “smart selection” of the targets and dif-ferent, but complementary fragmentation techniques will be
LC-MS-based Clinical Proteomics
1740 Molecular & Cellular Proteomics 5.10
an integral component for an effective LC-MS profiling plat-form suitable for clinical applications.
CONCLUSIONS AND PERSPECTIVES
The amount of effort placed into the development andapplication of effective proteomics profiling of serum/plasmaand other clinical samples has increased tremendously overthe last several years. With the emergence of more effectiveLC-MS technologies and the variety of fractionation ap-proaches, the number of proteins detectable in human plasmaby global profiling has been greatly expanded (e.g. 889 pro-teins with �95% confidence reported in the recent HUPOstudy and 1,494 proteins with �99% confidence, includingconfident identification of many low ng/ml level plasma pro-teins, in our recent study (59)). Although this level of detectionstill falls short of the 10 orders of magnitude in dynamic rangethat encompasses plasma protein abundances, it still offerssignificant potential for the discovery of novel candidate bi-omarkers from clinical plasma/serum samples.
Currently there is no single platform that represents the“best” technology for such discovery applications, and inte-gration of multiple technologies is often required for detectionand quantitation of low abundance proteins. The need forimproved reproducibility, throughput, dynamic range, andquantitation will continue to drive technology developmentand improvement efforts. Importantly several new technolog-ical developments such as fast LC separations, gas phaseIMS separations, and high efficiency nano-ESI interfacespresently appear promising for future discovery platforms andapplications. With improvements in quantitation accuracy,throughput, and robustness, the LC-MS protein profiling plat-form may eventually become a powerful tool for clinical diag-nostic testing that provides simultaneous measurements of alarge number of clinically relevant analytes.
An important component of any integrated profiling plat-form not previously discussed is the informatics and statisticalanalysis. The development of more effective software pack-ages will be essential for processing the large number ofLC-MS datasets, which may include peak (or feature) detec-tion, run-to-run feature alignment, intensity normalization, fea-ture matching to the database, and statistical analysis togenerate a list of high confidence potential candidates.
Finally due to the complexity of large scale clinical proteom-ics studies, collaborative efforts from multiple laboratorieswith different platforms may be required for benchmarkingand better cross-validation of the discovery results and elim-inating potential biases introduced into any given platform.This implies that a common set of standards is needed so thatplatform performance in different laboratories may be readilycompared and large scale proteomics datasets can be effec-tively exchanged and shared.
Acknowledgments—The contributions of Marina Gritsenko, Hongli-ang Jiang, Matt Monroe, Ron Moore, Tom Metz, Angela Norbeck,Sam Purvine, and Yufeng Shen to the work reviewed here are grate-fully acknowledged.
* Portions of the reviewed research were supported by the UnitedStates Department of Energy (DOE) Office of Biological and Environ-mental Research; the National Institutes of Health through the Na-tional Center for Research Resources Grant RR018522, NIGMS LargeScale Collaborative Research Grant U54 GM-62119-02, NIDDK GrantR21 DK070146, and NIDA Grant 1P30DA01562501; the Entertain-ment Industry Foundation (EIF) and the EIF Women’s Cancer Re-search Fund; and the Laboratory Directed Research Developmentprogram at Pacific Northwest National Laboratory. Our laboratoriesare located in the Environmental Molecular Sciences Laboratory, anational scientific user facility sponsored by the DOE and located atPacific Northwest National Laboratory, which is operated by BattelleMemorial Institute for the DOE under Contract DE-AC05-76RL0 1830.The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be herebymarked “advertisement” in accordance with 18 U.S.C. Section 1734solely to indicate this fact.
‡ To whom correspondence should be addressed: EnvironmentalMolecular Sciences Laboratory, Pacific Northwest National Labora-tory, P. O. Box 999, MSIN: K8-98, Richland, WA 99352. E-mail:[email protected].
REFERENCES
1. Aebersold, R., and Mann, M. (2003) Mass spectrometry-based proteom-ics. Nature 422, 198–207
2. Hanash, S. (2003) Disease proteomics. Nature 422, 226–2323. Etzioni, R., Urban, N., Ramsey, S., McIntosh, M., Schwartz, S., Reid, B.,
Radich, J., Anderson, G., and Hartwell, L. (2003) The case for earlydetection. Nat. Rev. Cancer 3, 243–252
4. Ludwig, J. A., and Weinstein, J. N. (2005) Biomarkers in cancer staging,prognosis and treatment selection. Nat. Rev. Cancer 5, 845–856
5. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome:history, character, and diagnostic prospects. Mol. Cell. Proteomics 1,845–867
6. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M.,Gillespie, J. W., Hu, N., Taylor, P. R., Emmert-Buck, M. R., Liotta, L. A.,Petricoin, E. F., III, and Zhao, Y. (2002) 2D differential in-gel electro-phoresis for the identification of esophageal scans cell cancer-specificprotein markers. Mol. Cell. Proteomics 1, 117–124
7. Zangar, R. C., Varnum, S. M., and Bollinger, N. (2005) Studying cellularprocesses and detecting disease with protein microarrays. Drug Metab.Rev. 37, 473–487
8. Janzi, M., Odling, J., Pan-Hammarstrom, Q., Sundberg, M., Lundeberg, J.,Uhlen, M., Hammarstrom, L., and Nilsson, P. (2005) Serum microarraysfor large scale screening of protein levels. Mol. Cell. Proteomics 4,1942–1947
9. Uhlen, M., Bjorling, E., Agaton, C., Szigyarto, C. A., Amini, B., Andersen,E., Andersson, A. C., Angelidou, P., Asplund, A., Asplund, C., Berglund,L., Bergstrom, K., Brumer, H., Cerjan, D., Ekstrom, M., Elobeid, A.,Eriksson, C., Fagerberg, L., Falk, R., Fall, J., Forsberg, M., Bjorklund,M. G., Gumbel, K., Halimi, A., Hallin, I., Hamsten, C., Hansson, M.,Hedhammar, M., Hercules, G., Kampf, C., Larsson, K., Lindskog, M.,Lodewyckx, W., Lund, J., Lundeberg, J., Magnusson, K., Malm, E.,Nilsson, P., Odling, J., Oksvold, P., Olsson, I., Oster, E., Ottosson, J.,Paavilainen, L., Persson, A., Rimini, R., Rockberg, J., Runeson, M.,Sivertsson, A., Skollermo, A., Steen, J., Stenvall, M., Sterky, F., Strom-berg, S., Sundberg, M., Tegel, H., Tourle, S., Wahlund, E., Walden, A.,Wan, J., Wernerus, H., Westberg, J., Wester, K., Wrethagen, U., Xu,L. L., Hober, S., and Ponten, F. (2005) A human protein atlas for normaland cancer tissues based on antibody proteomics. Mol. Cell. Proteom-ics 4, 1920–1932
10. Adkins, J. N., Varnum, S. M., Auberry, K. J., Moore, R. J., Angell, N. H.,Smith, R. D., Springer, D. L., and Pounds, J. G. (2002) Toward a humanblood serum proteome: analysis by multidimensional separation cou-pled with mass spectrometry. Mol. Cell. Proteomics 1, 947–955
11. Jacobs, J. M., Adkins, J. N., Qian, W. J., Liu, T., Shen, Y., Camp, D. G., II,and Smith, R. D. (2005) Utilizing human blood plasma for proteomicbiomarker discovery. J. Proteome Res. 4, 1073–1085
12. Veenstra, T. D., Conrads, T. P., Hood, B. L., Avellino, A. M., Ellenbogen,
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1741
R. G., and Morrison, R. S. (2005) Biomarkers: mining the biofluid pro-teome. Mol. Cell. Proteomics 4, 409–418
13. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarkerdiscovery from the plasma proteome using multidimensional fraction-ation proteomics. Curr. Opin. Chem. Biol. 10, 42–49
14. Wright, M. E., Han, D. K., and Aebersold, R. (2005) Mass spectrometry-based expression profiling of clinical prostate cancer. Mol. Cell. Pro-teomics 4, 545–554
15. Hu, Y., Malone, J. P., Fagan, A. M., Townsend, R. R., and Holtzman, D. M.(2005) Comparative proteomic analysis of intra- and interindividual var-iation in human cerebrospinal fluid. Mol. Cell. Proteomics 4, 2000–2009
16. Wattiez, R., and Falmagne, P. (2005) Proteomics of bronchoalveolar la-vage fluid. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 815,169–178
17. Liao, H., Wu, J., Kuhn, E., Chin, W., Chang, B., Jones, M. D., O’Neil, S.,Clauser, K. R., Karl, J., Hasler, F., Roubenoff, R., Zolg, W., and Guild,B. C. (2004) Use of mass spectrometry to identify protein biomarkers ofdisease severity in the synovial fluid and serum of patients with rheu-matoid arthritis. Arthritis Rheum. 0, 3792–3803
18. Varnum, S. M., Covington, C. C., Woodbury, R. L., Petritis, K., Kangas,L. J., Abdullah, M. S., Pounds, J. G., Smith, R. D., and Zangar, R. C.(2003) Proteomic characterization of nipple aspirate fluid: identificationof potential biomarkers of breast cancer. Breast Cancer Res. Treat. 80,87–97
19. Xie, H., Rhodus, N. L., Griffin, R. J., Carlis, J. V., and Griffin, T. J. (2005) Acatalogue of human saliva proteins identified by free flow electrophore-sis-based peptide separation and tandem mass spectrometry. Mol.Cell. Proteomics 4, 1826–1830
20. Theodorescu, D., Wittke, S., Ross, M. M., Walden, M., Conaway, M., Just,I., Mischak, H., and Frierson, H. F. (2006) Discovery and validation ofnew protein biomarkers for urothelial cancer: a prospective analysis.Lancet Oncol. 7, 230–240
21. Celis, J. E., Gromov, P., Cabezon, T., Moreira, J. M., Ambartsumian, N.,Sandelin, K., Rank, F., and Gromova, I. (2004) Proteomic characteriza-tion of the interstitial fluid perfusing the breast tumor microenvironment:a novel resource for biomarker and therapeutic target discovery. Mol.Cell. Proteomics 3, 327–344
22. Yates, J. R., III, Eng, J. K., and McCormack, A. L. (1995) Mining genomes:correlating tandem mass spectra of modified and unmodified peptidesto sequences in nucleotide databases. Anal. Chem. 67, 3202–3210
23. Perkins, D., Pappin, D., Creasy, D., and London, U. (1999) Probability-based protein identification by searching sequence databases usingmass spectrometry data. Electrophoresis 20, 3551–3567
24. Craig, R., and Beavis, R. C. (2004) TANDEM: matching proteins withtandem mass spectra. Bioinformatics 20, 1466–1467
25. Mayya, V., Rezaul, K., Cong, Y. S., and Han, D. (2005) Systematic com-parison of a two-dimensional ion trap and a three-dimensional ion trapmass spectrometer in proteomics. Mol. Cell. Proteomics 4, 214–223
26. Wolters, D. A., Washburn, M. P., and Yates, J. R. (2001) An automatedmultidimensional protein identification technology for shotgun proteom-ics. Anal. Chem. 73, 5683–5690
27. Wang, H., Qian, W. J., Chin, M. H., Petyuk, V. A., Barry, R. C., Liu, T.,Gritsenko, M. A., Mottaz, H. M., Moore, R. J., Camp, D. G., II, Khan,A. H., Smith, D. J., and Smith, R. D. (2006) Characterization of themouse brain proteome using global proteomic analysis complementedwith cysteinyl-peptide enrichment. J. Proteome Res. 5, 361–369
28. Tabb, D. L., MacCoss, M. J., Wu, C. C., Anderson, S. D., and Yates, J. R.(2003) Similarity among tandem mass spectra from proteomic experi-ments: detection, significance, and utility. Anal. Chem. 75, 2470–2477
29. Smith, R. D., Anderson, G. A., Lipton, M. S., Pasa-Tolic, L., Shen, Y.,Conrads, T. P., Veenstra, T. D., and Udseth, H. R. (2002) An accuratemass tag strategy for quantitative and high throughput proteome meas-urements. Proteomics 2, 513–523
30. Qian, W. J., Camp, D. G., and Smith, R. D. (2004) High throughputproteomics using Fourier transform ion cyclotron resonance (FTICR)mass spectrometry. Expert Rev. Proteomics 1, 89–97
31. Qian, W. J., Monroe, M. E., Liu, T., Jacobs, J. M., Anderson, G. A., Shen,Y., Moore, R. J., Anderson, D. J., Zhang, R., Calvano, S. E., Lowry, S. F.,Xiao, W., Moldawer, L. L., Davis, R. W., Tompkins, R. G., Camp, D. G.,and Smith, R. D. (2005) Quantitative proteome analysis of humanplasma following in vivo lipopolysaccharide administration using 16O/
18O labeling and the accurate mass and time tag approach. Mol. Cell.Proteomics 4, 700–709
32. Qian, W. J., Liu, T., Monroe, M. E., Strittmatter, E. F., Jacobs, J. M.,Kangas, L. J., Petritis, K., Camp, D. G., and Smith, R. D. (2005) Prob-ability-based evaluation of peptide and protein identifications from tan-dem mass spectrometry and SEQUEST analysis: the human proteome.J. Proteome Res. 4, 53–62
33. Tolley, L., Jorgenson, J. W., and Moseley, M. A. (2001) Very high pressuregradient LC/MS/MS. Anal. Chem. 73, 2985–2991
34. Shen, Y., Zhao, R., Berger, S. J., Anderson, G. A., Rodriguez, N., andSmith, R. D. (2002) High-efficiency nanoscale liquid chromatographycoupled on-line with mass spectrometry using nanoelectrospray ioni-zation for proteomics. Anal. Chem. 74, 4235–4249
35. Shen, Y., Zhang, R., Moore, R. J., Kim, J., Metz, T. O., Hixson, K. K., Zhao,R., Livesay, E. A., Udseth, H. R., and Smith, R. D. (2005) Automated 20kpsi RPLC-MS and MS/MS with chromatographic peak capacities of1000–1500 and capabilities in proteomics and metabolomics. Anal.Chem. 77, 3090–3100
36. Wilm, M. S., and Mann, M. (1994) Electrospray and Taylor-Cone theory,Dole’s beam of macromolecules at last? Int. J. Mass Spectrom. IonProcess. 136, 167–180
37. Smith, R. D., Shen, Y., and Tang, K. (2004) Ultrasensitive and quantitativeanalyses from combined separations-mass spectrometry for the char-acterization of proteomes. Acc. Chem. Res. 37, 269–278
38. Zolotarjova, N., Martosella, J., Nicol, G., Bailey, J., Boyes, B. E., andBarrett, W. C. (2005) Differences among techniques for high-abundantprotein depletion. Proteomics 5, 3304–3313
39. Huang, L., Harvie, G., Feitelson, J. S., Gramatikoff, K., Herold, D. A., Allen,D. L., Amunngama, R., Hagler, R. A., Pisano, M. R., Zhang, W. W., andFang, X. (2005) Immunoaffinity separation of plasma proteins by IgYmicrobeads: meeting the needs of proteomic sample preparation andanalysis. Proteomics 5, 3314–3328
40. Echan, L. A., Tang, H. Y., Ali-Khan, N., Lee, K., and Speicher, D. W. (2005)Depletion of multiple high-abundance proteins improves protein profil-ing capacities of human serum and plasma. Proteomics 5, 3292–3303
41. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park,Y. K., Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho,J. W., Kim, H. S., and Paik, Y. K. (2005) Efficient prefractionation oflow-abundance proteins in human plasma and construction of a two-dimensional map. Proteomics 5, 3386–3396
42. Liu, T., Qian, W. J., Mottaz, H. M., Gritsenko, M. A., Norbeck, A. D., Moore,R. J., Purvine, S. O., Camp, D. G., II, and Smith, R. D. (July 19, 2006)Evaluation of multiprotein immunoaffinity subtraction for plasma pro-teomics and candidate biomarker discovery using mass spectrometry.Mol. Cell. Proteomics 10.1074/mcp.T600039-MCP200
43. Wang, H., Clouthier, S. G., Galchev, V., Misek, D. E., Duffner, U., Min,C. K., Zhao, R., Tra, J., Omenn, G. S., Ferrara, J. L., and Hanash, S. M.(2005) Intact-protein-based high-resolution three-dimensional quantita-tive analysis system for proteome profiling of biological fluids. Mol. Cell.Proteomics 4, 618–625
44. Wang, H., and Hanash, S. (2005) Intact-protein based sample preparationstrategies for proteome analysis in combination with mass spectrome-try. Mass Spectrom. Rev. 24, 413–426
45. Sheng, S., Chen, D., and Van Eyk, J. E. (2006) Multidimensional liquidchromatography separation of intact proteins by chromatographic fo-cusing and reversed phase of the human serum proteome: optimizationand protein database. Mol. Cell. Proteomics 5, 26–34
46. Barnea, E., Sorkin, R., Ziv, T., Beer, I., and Admon, A. (2005) Evaluation ofprefractionation methods as a preparatory step for multidimensionalbased chromatography of serum proteins. Proteomics 5, 3367–3375
47. Moritz, R. L., Clippingdale, A. B., Kapp, E. A., Eddes, J. S., Ji, H., Gilbert,S., Connolly, L. M., and Simpson, R. J. (2005) Application of 2-Dfree-flow electrophoresis/RP-HPLC for proteomic analysis of humanplasma depleted of multi high-abundance proteins. Proteomics 5,3402–3413
48. Heller, M., Michel, P. E., Morier, P., Crettaz, D., Wenz, C., Tissot, J. D.,Reymond, F., and Rossier, J. S. (2005) Two-stage Off-Gel isoelectricfocusing: protein followed by peptide fractionation and application toproteome analysis of human plasma. Electrophoresis 26, 1174–1188
49. Misek, D. E., Kuick, R., Wang, H., Galchev, V., Deng, B., Zhao, R., Tra, J.,Pisano, M. R., Amunugama, R., Allen, D., Walker, A. K., Strahler, J. R.,
LC-MS-based Clinical Proteomics
1742 Molecular & Cellular Proteomics 5.10
Andrews, P., Omenn, G. S., and Hanash, S. M. (2005) A wide range ofprotein isoforms in serum and plasma uncovered by a quantitative intactprotein analysis system. Proteomics 5, 3343–3352
50. Tang, H. Y., Ali-Khan, N., Echan, L. A., Levenkova, N., Rux, J. J., andSpeicher, D. W. (2005) A novel four-dimensional strategy combiningprotein and peptide separation methods enables detection of low-abundance proteins in human plasma and serum proteomes. Proteom-ics 5, 3329–3342
51. Herbert, B., and Righetti, P. G. (2000) A turning point in proteome analysis:sample prefractionation via multicompartment electrolyzers with iso-electric membranes. Electrophoresis 21, 3639–3648
52. Tu, C. J., Dai, J., Li, S. J., Sheng, Q. H., Deng, W. J., Xia, Q. C., and Zeng,R. (2005) High-sensitivity analysis of human plasma proteome by im-mobilized isoelectric focusing fractionation coupled to mass spectrom-etry identification. J. Proteome Res. 4, 1265–1273
53. Andersen, J. S., Lam, Y. W., Leung, A. K., Ong, S. E., Lyon, C. E., Lamond,A. I., and Mann, M. (2005) Nucleolar proteome dynamics. Nature 433,77–83
54. Jin, W. H., Dai, J., Li, S. J., Xia, Q. C., Zou, H. F., and Zeng, R. (2005)Human plasma proteome analysis by multidimensional chromatographyprefractionation and linear ion trap mass spectrometry identification. J.Proteome Res. 4, 613–619
55. Liu, T., Qian, W. J., Strittmatter, E. F., Camp, D. G., Anderson, G. A., Thrall,B. D., and Smith, R. D. (2004) High throughput comparative proteomeanalysis using a quantitative cysteinyl-peptide enrichment technology.Anal. Chem. 76, 5345–5353
56. Zhang, H., Li, X.-j., Martin, D. B., and Aerbersold, R. (2003) Identificationand quantification of N-linked glycoproteins using hydrazide chemistry,stable isotope labeling and mass spectrometry. Nat. Biotechnol. 21,660–665
57. Liu, T., Qian, W. J., Gritsenko, M. A., Camp, D. G., II, Monroe, M. E.,Moore, R. J., and Smith, R. D. (2005) Human plasma N-glycoproteomeanalysis by immunoaffinity subtraction, hydrazide chemistry, and massspectrometry. J. Proteome Res. 4, 2070–2080
58. Yang, Z. P., Hancock, W. S., Chew, T. R., and Bonilla, L. (2005) A study ofglycoproteins in human serum and plasma reference standards (HUPO)using multilectin affinity chromatography coupled with RPLC-MS/MS.Proteomics 5, 3353–3366
59. Liu, T., Qian, W. J., Gritsenko, M. A., Xiao, W., Moldawer, L. L., Kaushal,A., Monroe, M. E., Varnum, S. M., Moore, R. J., Purvine, S. O., Maier,R. V., Davis, R. W., Tompkins, R. G., Camp, D. G., II, and Smith, R. D.(June 8, 2006) High dynamic range characterization of the traumapatient plasma proteome. Mol. Cell. Proteomics 10.1074/mcp.M600068-MCP200
60. Shen, Y., Smith, R. D., Unger, K. K., Kumar, D., and Lubda, D. (2005)Ultrahigh-throughput proteomics using fast RPLC separations with ESI-MS/MS. Anal. Chem. 77, 6692–6701
61. Chen, H. S., Rejtar, T., Andreev, V., Moskovets, E., and Karger, B. L.(2005) High-speed, high-resolution monolithic capillary LC-MALDI MSusing an off-line continuous deposition interface for proteomic analysis.Anal. Chem. 77, 2323–2331
62. Xie, J., Miao, Y., Shih, J., Tai, Y. C., and Lee, T. D. (2005) Microfluidicplatform for liquid chromatography-tandem mass spectrometry analy-ses of complex peptide mixtures. Anal. Chem. 77, 6947–6953
63. He, B., and Regnier, F. (1998) Microfabricated liquid chromatographycolumns based on collocated monolith support structures. J. Pharm.Biomed. Anal. 17, 925–932
64. Li, J., LeRiche, T., Tremblay, T. L., Wang, C., Bonneil, E., Harrison, D. J.,and Thibault, P. (2002) Application of microfluidic devices to proteomicsresearch: identification of trace-level protein digests and affinity captureof target peptides. Mol. Cell. Proteomics 1, 157–168
65. Srebalus, C. A., Li, J., Marshall, W. S., and Clemmer, D. E. (2000) Deter-mining synthetic failures in combinatorial libraries by hybrid gas-phaseseparation methods. J. Am. Soc. Mass Spectrom. 11, 352–355
66. Henderson, S. C., Valentine, S. J., Counterman, A. E., and Clemmer, D. E.(1999) ESI/ion trap/ion mobility/time-of-flight mass spectrometry forrapid and sensitive analysis of biomolecular mixtures. Anal. Chem. 71,291–301
67. Valentine, S. J., Kulchania, M., Srebalus Barnes, C. A., and Clemmer, D. E.(2001) Multidimensional separations of complex peptide mixtures: acombined high-performance liquid chromatography/ion mobility/time-
of-flight mass spectrometry approach. Int. J. Mass Spectrom. 212,97–109
68. Tang, K., Shvartsburg, A. A., Lee, H. N., Prior, D. C., Buschbach, M. A., Li,F., Tolmachev, A. V., Anderson, G. A., and Smith, R. D. (2005) High-sensitivity ion mobility spectrometry/mass spectrometry using electro-dynamic ion funnel interfaces. Anal. Chem. 77, 3330–3339
69. Shen, Y., Jacobs, J. M., Camp, D. G., Fang, R., Moore, R. J., Smith, R. D.,Xiao, W., Davis, R. W., and Tompkins, R. G. (2004) High efficiencySCXLC/RPLC/MS/MS for high dynamic range characterization of thehuman plasma proteome. Anal. Chem. 76, 1134–1144
70. Anderson, N. L., Polanski, M., Pieper, R., Gatlin, T., Tirumalai, R. S.,Conrads, T. P., Veenstra, T. D., Adkins, J. N., Pounds, J. G., Fagan, R.,and Lobley, A. (2004) The human plasma proteome: a nonredundant listdeveloped by combination of four separate sources. Mol. Cell. Pro-teomics 3, 311–316
71. States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D., Eng, J., Speicher,D. W., and Hanash, S. M. (2006) Challenges in deriving high-confidenceprotein identifications from data gathered by a HUPO plasma proteomecollaborative study. Nat. Biotechnol. 24, 333–338
72. Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J., and Gygi, S. P. (2003)Evaluation of multidimensional chromatography coupled with tandemmass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: theyeast proteome. J. Proteome Res. 2, 43–50
73. Tirumalai, R. S., Chan, K. C., Prieto, D. A., Issaq, H. J., Conrads, T. P., andVeenstra, T. D. (2003) Characterization of the low molecular weighthuman serum proteome. Mol. Cell. Proteomics 2, 1096–1103
74. Qian, W. J., Jacobs, J. M., Camp II, D. G., Monroe, M. E., Moore, R. J.,Gritsenko, M. A., Calvano, S. E., Lowry, S. F., Xiao, W., Moldawer, L. L.,Davis, R. W., Tompkins, R. G., and Smith, R. D. (2005) Comparativeproteome analyses of human plasma following in vivo lipopolysaccha-ride administration using multidimensional separations coupled withtandem mass spectrometry. Proteomics 5, 572–584
75. Washburn, M. P., Wolters, D., and Yates, J. R. (2001) Large-scale analysisof the yeast proteome by multidimensional protein identification tech-nology. Nat. Biotechnol. 19, 242–247
76. Xie, H., and Griffin, T. J. (2006) Trade-off between high sensitivity andincreased potential for false positive peptide sequence matches using atwo-dimensional linear ion trap for tandem mass spectrometry-basedproteomics. J. Proteome Res. 5, 1003–1009
77. Omenn, G. S., States, D. J., Adamski, M., Blackwell, T. W., Menon, R.,Hermjakob, H., Apweiler, R., Haab, B. B., Simpson, R. J., Eddes, J. S.,Kapp, E. A., Moritz, R. L., Chan, D. W., Rai, A. J., Admon, A., Aebersold,R., Eng, J., Hancock, W. S., Hefta, S. A., Meyer, H., Paik, Y. K., Yoo,J. S., Ping, P., Pounds, J., Adkins, J., Qian, X., Wang, R., Wasinger, V.,Wu, C. Y., Zhao, X., Zeng, R., Archakov, A., Tsugita, A., Beer, I., Pandey,A., Pisano, M., Andrews, P., Tammen, H., Speicher, D. W., and Hanash,S. M. (2005) Overview of the HUPO Plasma Proteome Project: resultsfrom the pilot phase with 35 collaborating laboratories and multipleanalytical groups, generating a core dataset of 3020 proteins and apublicly-available database. Proteomics 5, 3226–3245
78. Hood, B. L., Zhou, M., Chan, K. C., Lucas, D. A., Kim, G. J., Issaq, H. J.,Veenstra, T. D., and Conrads, T. P. (2005) Investigation of the mouseserum proteome. J. Proteome Res. 4, 1561–1568
79. Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empiricalstatistical model to estimate the accuracy of peptide identificationsmade by MS/MS and database search. Anal. Chem. 74, 5383–5392
80. Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2003) A statis-tical model for identifying proteins by tandem mass spectrometry. Anal.Chem. 75, 4646–4658
81. MacCoss, M. J., Wu, C. C., and Yates, J. R. (2002) Probability-basedvalidation of protein identifications using a modified SEQUEST algo-rithm. Anal. Chem. 74, 5593–5599
82. Anderson, D. C., Li, W., Payan, D. G., and Noble, W. S. (2003) A newalgorithm for the evaluation of shotgun peptide sequencing in proteom-ics: support vector machine classification of peptide MS/MS spectraand SEQUEST scores. J. Proteome Res. 2, 137–146
83. Fenyo, D., and Beavis, R. C. (2003) A method for assessing the statisticalsignificance of mass spectrometry-based protein identifications usinggeneral scoring schemes. Anal. Chem. 75, 768–774
84. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., andWatanabe, C. (1993) Identifying proteins from two-dimensional gels by
LC-MS-based Clinical Proteomics
Molecular & Cellular Proteomics 5.10 1743
molecular mass searching of peptide fragments in protein sequencedatabases. Proc. Natl. Acad. Sci. U. S. A. 90, 5011–5015
85. Pappin, D. J., Hojrup, P., and Bleasby, A. J. (1993) Rapid identification ofproteins by peptide-mass fingerprinting. Curr. Biol. 3, 327–332
86. Yates, J. R., Speicher, S., Griffin, P. R., and Hunkapiller, T. (1993) Peptidemass maps: a highly informative approach to protein identification.Analytical Biochemistry 214, 397–408
87. Zimmer, J. S., Monroe, M. E., Qian, W. J., and Smith, R. D. (2006)Advances in proteomics data analysis and display using an accuratemass and time tag approach. Mass Spectrom. Rev. 25, 450–482
88. Olsen, J. V., and Mann, M. (2004) Improved peptide identification inproteomics by two consecutive stages of mass spectrometric fragmen-tation. Proc. Natl. Acad. Sci. U. S. A. 101, 13417–13422
89. Dieguez-Acuna, F. J., Gerber, S. A., Kodama, S., Elias, J. E., Beausoleil,S. A., Faustman, D., and Gygi, S. P. (2005) Characterization of mousespleen cells by subtractive proteomics. Mol. Cell. Proteomics 4,1459–1470
90. Gao, J., Opiteck, G. J., Friedrichs, M. S., Dongre, A. R., and Hefta, S. A.(2003) Changes in the protein expression of yeast as a function ofcarbon source. J. Proteome Res. 2, 643–649
91. Liu, H., Sadygov, R. G., and Yates, J. R. (2004) A model for randomsampling and estimation of relative protein abundance in shotgun pro-teomics. Anal. Chem. 76, 4193–4201
92. Jacobs, J. M., Diamond, D. L., Chan, E. Y., Gritsenko, M. A., Qian, W. J.,Stastna, M., Camp, D. G., Rice, C. M., Carithers, R. L., Katze, M. G., andSmith, R. D. (2005) Proteome analysis of Huh-7.5 cells containingfull-length hepatitis C virus replicon and application to HCV infectedliver biopsy samples. J. Virol. 79, 7558–7569
93. Zybailov, B., Coleman, M. K., Florens, L., and Washburn, M. P. (2005)Correlation of relative abundance ratios derived from peptide ion chro-matograms and spectrum counting for quantitative proteomic analysisusing stable isotope labeling. Anal. Chem. 77, 6218–6224
94. Heller, M., Mattou, H., Menzel, C., and Yao, X. (2003) Trypsin catalyzed16O-to-18O exchange for comparative proteomics: tandem mass spec-trometry comparison using MALDI-TOF, ESI-QTOF, and ESI-ion trapmass spectrometers. J. Am. Soc. Mass Spectrom. 14, 704–718
95. Pasa-Tolic, L., Jensen, P. K., Anderson, G. A., Lipton, M. S., Peden, K. K.,Martinovic, S., Tolic, N., Bruce, J. E., and Smith, R. D. (1999) Highthroughput proteome-wide precision measurements of protein expres-sion using mass spectrometry. J. Am. Chem. Soc. 121, 7949–7950
96. Oda, Y., Huang, K., Cross, F. R., Cowburn, D., and Chait, B. T. (1999)Accurate quantitation of protein expression and site-specific phospho-rylation. Proc. Natl. Acad. Sci. U. S. A. 96, 6591–6596
97. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H.,Pandey, A., and Mann, M. (2002) Stable isotope labeling by amino acidsin cell culture, SILAC, as a simple and accurate approach to expressionproteomics. Mol. Cell. Proteomics 1, 376–386
98. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold,R. (1999) Quantitative analysis of complex protein mixtures using iso-tope-coded affinity tags. Nat. Biotechnol. 17, 994–999
99. Zhang, Y., Wolf-Yadlin, A., Ross, P. L., Pappin, D. J., Rush, J., Lauffen-burger, D. A., and White, F. M. (2005) Time-resolved mass spectrometryof tyrosine phosphorylation sites in the epidermal growth factor recep-tor signaling network reveals dynamic modules. Mol. Cell. Proteomics 4,1240–1250
100. DeSouza, L., Diehl, G., Rodrigues, M. J., Guo, J., Romaschin, A. D.,Colgan, T. J., and Siu, K. W. (2005) Search for cancer markers fromendometrial tissues using differentially labeled tags iTRAQ and cICAT
with multidimensional liquid chromatography and tandem mass spec-trometry. J. Proteome Res. 4, 377–386
101. Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T. A., Hill, L. R., Norton, S.,Kumar, P., Anderle, M., and Beker, C. H. (2003) Quantification of pro-teins and metabolites by mass spectrometry without isotope labeling orspiked standards. Anal. Chem. 75, 4818–4826
102. Chelius, D., and Bondarenko, P. V. (2002) Quantitative profiling of proteinsin complex mixtures using liquid chromatography and mass spectrom-etry. J. Proteome Res. 1, 317–323
103. Fang, R., Elias, D. A., Monroe, M. E., Shen, Y., McIntosh, M., Wang, P.,Goddard, C. D., Callister, S. J., Moore, R. J., Gorby, Y. A., Adkins, J. N.,Fredrickson, J. K., Lipton, M. S., and Smith, R. D. (2006) Differentiallabel-free quantitative proteomic analysis of Shewanella oneidensis cul-tured under aerobic and suboxic conditions by accurate mass and timetag approach. Mol. Cell. Proteomics 5, 714–725
104. Tang, K., Page, J. S., and Smith, R. D. (2004) Charge competition and thelinear dynamic range of detection in electrospray ionization mass spec-trometry. J. Am. Soc. Mass Spectrom. 15, 1416–1423
105. Luo, Q., Shen, Y., Hixson, K. K., Zhao, R., Yang, F., Moore, R. J., Mottaz,H. M., and Smith, R. D. (2005) Preparation of 20-�m-i.d. silica-basedmonolithic columns and their performance for proteomics analyses.Anal. Chem. 77, 5028–5035
106. Juraschek, R., Dulcks, T., and Karas, M. (1999) Nanoelectrospray—morethan just a minimized-flow electrospray ionization source. J. Am. Soc.Mass Spectrom. 10, 300–308
107. Alaiya, A., Al-Mohanna, M., and Linder, S. (2005) Clinical cancer proteom-ics: promises and pitfalls. J. Proteome Res. 4, 1213–1222
108. Zhan, X., and Desiderio, D. M. (2003) Heterogeneity analysis of the humanpituitary proteome. Clin. Chem. 49, 1740–1751
109. Mann, K. G., Brummel-Ziedins, K., Undas, A., and Butenas, S. (2004) Doesthe genotype predict the phenotype? Evaluations of the hemostaticproteome. J. Thromb. Haemostasis 2, 1727–1734
110. Kendziorski, C., Irizarry, R. A., Chen, K. S., Haag, J. D., and Gould, M. N.(2005) On the utility of pooling biological samples in microarray exper-iments. Proc. Natl. Acad. Sci. U. S. A. 102, 4252–4257
111. Sickmann, A., Marcus, K., Schafer, H., Butt-Dorje, E., Lehr, S., Herkner,A., Suer, S., Bahr, I., and Meyer, H. E. (2001) Identification of post-translationally modified proteins in proteome studies. Electrophoresis22, 1669–1676
112. Anderson, L. (2005) Candidate-based proteomics in the search for bi-omarkers of cardiovascular disease. J. Physiol. 563, 23–60
113. Anderson, L., and Hunter, C. L. (2006) Quantitative mass spectrometricmultiple reaction monitoring assays for major plasma proteins. Mol.Cell. Proteomics 5, 573–588
114. Anderson, N. L., Anderson, N. G., Haines, L. R., Hardie, D. B., Olafson,R. W., and Pearson, T. W. (2004) Mass spectrometric quantitation ofpeptides and proteins using Stable Isotope Standards and Capture byAnti-Peptide Antibodies (SISCAPA). J. Proteome Res. 3, 235–244
115. Berger, A. B., Vitorino, P. M., and Bogyo, M. (2004) Activity-based proteinprofiling: applications to biomarker discovery, in vivo imaging and drugdiscovery. Am. J. Pharmacogenomics 4, 371–381
116. Speers, A. E., and Cravatt, B. F. (2004) Chemical strategies for activity-based proteomics. Chembiochem 5, 41–47
117. Masselon, C., Pasa-Tolic, L., Tolic, N., Anderson, G. A., Bogdanov, B.,Vilkov, A. N., Shen, Y., Zhao, R., Qian, W. J., Lipton, M. S., Camp, D. G.,II, and Smith, R. D. (2005) Targeted comparative proteomics by liquidchromatography-tandem Fourier ion cyclotron resonance mass spec-trometry. Anal. Chem. 77, 400–406
LC-MS-based Clinical Proteomics
1744 Molecular & Cellular Proteomics 5.10