9
Chapter 67 Microbial Community Proteomics Paul Wilmes 67.1 INTRODUCTION Within the biosphere, microorganisms represent the predominant mode of life. Microbial proteins thus constitute the primary catalytic entities that underlie the major biogeochemical cycles on our planet. Recent technological developments are facilitating the large-scale extraction, separation, and identification of proteins from various natural microbial communities. Most excitingly, microbial community proteomics is facilitating the high-resolution representation of community structure and function, a long-sought goal of microbial ecology. Community genomic sequencing projects, which analyze genomic DNA directly from environmental samples, are providing opportunities to genetically characterize microbial ecosystems [Wilmes et al., 2009a]. These studies vastly expand our knowledge of the genetic diversity and the physiological potential within a rapidly increasing set of selected environments that include activated sludge [Garc´ ıa-Mart´ ın et al., 2006], acid mine biofilms [Tyson et al., 2004], seawater [Venter et al., 2004], human guts [Gill et al., 2006], termite hindguts [Warnecke et al., 2007]; see also Vol. II. The exponentially growing DNA sequence data (genomic and metagenomic) forms the foundation for post-genomic approaches that provide functional insight into micro- bial ecosystems. Consequently, the field of microbial ecology is currently entering the era of Eco-Systems Biology with community proteomics occupying a central role. The study of protein expression within environmental samples is not an entirely new concept [Ogunseitan, 1993; Ehlers and Cloete, 1999]. However, the present chapter discusses recent community proteomic studies (from 2004 onwards), which have developed on the Handbook of Molecular Microbial Ecology, Volume I: Metagenomics and Complementary Approaches, First Edition. Edited by Frans J. de Bruijn. © 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc. back of vast accumulation of DNA sequences from a range of microbial habitats, improved protein separation tech- niques, and high-throughput protein identification by mass spectrometry. 67.2 COMMUNITY PROTEOMICS FOR NOVICES The workflow for a community proteomic experiment is analogous to that utilized for a proteomic study of an axenic culture apart from the sample preparation step which needs to be more rigorous to ensure purity and representativeness. The procedural steps are briefly sum- marized as (i) sample preparation, (ii) protein extraction, (iii) separation of the proteins or peptides (usually in two dimensions), and (iv) mass spectrometry (MS) analysis followed by in silico spectral matching for identification of proteins (see Fig. 67.1 for a detailed workflow). The separations of proteins or peptides are typically performed either by classical single- or two-dimensional polyacrylamide gel electrophoresis (2DE), or by liquid chromatography (LC), respectively. 2DE provides a tangi- ble and comprehensive visual representation of the protein complement after staining. The comparison of protein spot densities is convenient for detection of differential pro- tein expression in response to changes in environmental conditions or community structure and function [Wilmes et al., 2008a; see also chapters 66, 68, 69 and 70, Vol I]. Chosen spots are excised and digested with a protease (typically trypsin is used) prior to MS analysis. A major limitation associated with the classical 2DE approach for community proteomics is the co-migration of distinct pro- teins into discrete spots hampering accurate quantifica- tion and identification. Large and hydrophobic proteins 627

Handbook of Molecular Microbial Ecology I (Metagenomics and Complementary Approaches) || Microbial Community Proteomics

  • Upload
    frans-j

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Chapter 67

Microbial Community Proteomics

Paul Wilmes

67.1 INTRODUCTION

Within the biosphere, microorganisms represent thepredominant mode of life. Microbial proteins thusconstitute the primary catalytic entities that underliethe major biogeochemical cycles on our planet. Recenttechnological developments are facilitating the large-scaleextraction, separation, and identification of proteins fromvarious natural microbial communities. Most excitingly,microbial community proteomics is facilitating thehigh-resolution representation of community structureand function, a long-sought goal of microbial ecology.

Community genomic sequencing projects, whichanalyze genomic DNA directly from environmentalsamples, are providing opportunities to geneticallycharacterize microbial ecosystems [Wilmes et al., 2009a].These studies vastly expand our knowledge of thegenetic diversity and the physiological potential withina rapidly increasing set of selected environments thatinclude activated sludge [Garcıa-Martın et al., 2006],acid mine biofilms [Tyson et al., 2004], seawater [Venteret al., 2004], human guts [Gill et al., 2006], termitehindguts [Warnecke et al., 2007]; see also Vol. II. Theexponentially growing DNA sequence data (genomic andmetagenomic) forms the foundation for post-genomicapproaches that provide functional insight into micro-bial ecosystems. Consequently, the field of microbialecology is currently entering the era of Eco-SystemsBiology with community proteomics occupying a centralrole.

The study of protein expression within environmentalsamples is not an entirely new concept [Ogunseitan,1993; Ehlers and Cloete, 1999]. However, the presentchapter discusses recent community proteomic studies(from 2004 onwards), which have developed on the

Handbook of Molecular Microbial Ecology, Volume I: Metagenomics and Complementary Approaches, First Edition. Edited by Frans J. de Bruijn.© 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.

back of vast accumulation of DNA sequences from a rangeof microbial habitats, improved protein separation tech-niques, and high-throughput protein identification by massspectrometry.

67.2 COMMUNITY PROTEOMICSFOR NOVICES

The workflow for a community proteomic experiment isanalogous to that utilized for a proteomic study of anaxenic culture apart from the sample preparation stepwhich needs to be more rigorous to ensure purity andrepresentativeness. The procedural steps are briefly sum-marized as (i) sample preparation, (ii) protein extraction,(iii) separation of the proteins or peptides (usually in twodimensions), and (iv) mass spectrometry (MS) analysisfollowed by in silico spectral matching for identificationof proteins (see Fig. 67.1 for a detailed workflow).

The separations of proteins or peptides are typicallyperformed either by classical single- or two-dimensionalpolyacrylamide gel electrophoresis (2DE), or by liquidchromatography (LC), respectively. 2DE provides a tangi-ble and comprehensive visual representation of the proteincomplement after staining. The comparison of protein spotdensities is convenient for detection of differential pro-tein expression in response to changes in environmentalconditions or community structure and function [Wilmeset al., 2008a; see also chapters 66, 68, 69 and 70, VolI]. Chosen spots are excised and digested with a protease(typically trypsin is used) prior to MS analysis. A majorlimitation associated with the classical 2DE approach forcommunity proteomics is the co-migration of distinct pro-teins into discrete spots hampering accurate quantifica-tion and identification. Large and hydrophobic proteins

627

628 Chapter 67 Microbial Community Proteomics

Workflow Stage

1. Sample preparation

ExtracellularCell membraneTotal cell Iysate

Isoelectric focusingProtein solubilityAffinity column

Trypsin digestion

Various type of MS, MS/MS

2DE - Gelelectrophoresis

Spot quantification

Spot excisionTrypsin digestion

Peptide mass fingerprintMascot or SEQUEST

database search

De novo peptidesequence

MSBLAST databasesearch

2D-nano-LCCation exchange,

reverse phase

2. Sample fractionationand communityproteome extraction

3. Communityproteome fractionation

4. Proteinseparation

5. Massspectrometry

6. ProteinIdentification

Features

Cell concentrationSeparation from

environmental matrixCell wash steps

Figure 67.1 Community proteomics sample preparation, extraction, separation, and identification routes. The workflow for a communityproteomic analysis may consist of six stages. Sample preparation may be required in stage 1. Cells may need to be concentrated or purifiedaway from interfering substances—for example, humic acids in soil. Protein extraction is performed next (stage 2), and fractions of interest maybe targeted—for example, extracellular, membrane, soluble and whole-cell fractions [Ram et al., 2005]. Cell lysis may involve French presslysis, sonication, chemical lysis, or bead beating [Wilmes and Bond, 2004]. The procedures in these stages must have minimal effect on theprotein expression itself and sufficiently preserve the extracted proteins. To assist later separations, the extracted protein complement may befractionated (stage 3)—for example, divided into fractions by preparative liquid isoelectric focusing or based on protein solubility. Protein orpeptide separations may be performed by two-dimensional polyacrylamide gel electrophoresis (2DE) or by liquid chromatographic (LC)methods, respectively (stage 4). Following 2DE, gel images are analyzed and spots are quantified. Chosen spots are then excised and digestedwith a protease (trypsin) for mass spectrometry (MS) analysis [e.g., matrix-assisted laser desorption ionization time-of-flight (MALDI-ToF)]. ForLC, the protein mixture is digested with trypsin prior to separation. Following LC, the separated peptides are directly introduced into the massspectrometer (e.g., by electrospray ionization) and mass spectra acquired. MS analysis may also involve the fragmentation of the peptides andrecording of the MS/MS spectra. If required, de novo protein sequence data can be determined from the MS/MS data. Algorithms such asMascot [Perkins et al., 1999] and SEQUEST [Eng et al., 1994] enable the MS or MS/MS peptide mass fingerprint data to be searched againstsequence databases for protein identification. Acronyms: 2DE, two-dimensional polyacrylamide gel electrophoresis; 2D-nano-LC,two-dimensional nano liquid chromatography. (Reproduced from Wilmes and Bond [2009] with permission from Elsevier.)

are also not separated well by 2DE (e.g., membraneproteins [Santoni et al., 2000]); the method suffersfrom limited automation potential, which, in turn, limitssample throughput. However, 2DE has good proteinseparation power for soluble proteins when combinedwith prefractionation procedures. Proteins of interestcan be readily identified after proteolytic digestion,

peptide mass fingerprints (PMF) generation by MS, anddatabase searching. Although 2DE can be optimizedsubstantially for community proteomics [Abram et al.,2009], LC-based approaches are rapidly gaining ground.

In the LC-based “shotgun” proteomic approach,complex protein samples are digested and the result-ing peptides are separated, subjected to MS analysis,

67.2 Community Proteomics for Novices 629

and PMFs are generated (Fig. 67.1). The approachcircumvents many limitations of the 2DE approachand allows high-throughput analysis identifying somethousands of proteins within 1–2 days [VerBerkmoeset al., 2009a]. In particular, it facilitates the high-throughput analyses of insoluble membrane proteins [Wuet al., 2003]; hence, membrane fractions are typicallygenerated in LC-based community proteomics [Ramet al., 2005]. The proteomic approach to separate andanalyze at the peptide level, as opposed to proteinseparation, is referred to as “shotgun” or “bottom-up”proteomics. Different LC and MS systems are used.However, 2D nano-LC (strong cation exchange followedby reverse phase LC) coupled to an LTQ-Orbitrap mass

spectrometer has proven well-suited for analysis ofmixed microbial proteomes [Wilmes et al., 2008b; Denefet al., 2009; Verberkmoes et al., 2009b; Denef et al.,2010].

Parent proteins can be identified by sequence data-base matching of the PMFs using search algorithmssuch as Mascot [Perkins et al., 1999] and SEQUEST[Eng et al., 1994], and these are most easily achieved ifrepresentative genomic sequences are available. However,community proteomics is complicated by strain variationof protein species (Fig. 67.2) and precise identificationsrequire high-mass accuracy peptide measurements andtandem MS (MS/MS) involving fragmentation of thepeptides [Hunt et al., 1986].

(A)

(B)

Figure 67.2 Strain-resolved proteomics in “CandidatusAccumulibacter phosphatis” (A. phosphatis)-dominatedactivated sludge. (A) Self-organizing map based ontetranucleotide frequency of genomic scaffold fragments withgenomic fragments encoding identified proteins highlightedaccording to taxonomic groups (the A. phosphatis fragmentcluster which includes numerous unclassified genomicfragments is delimited by a yellow line). Information inpanel is color-coded according to concatenated metagenomicbinning information [Martın et al., 2006, McHardy et al.,2007]: bright yellow, A. phosphatis ; dark yellow,Accumulibacter ; orange, Betaproteobacteria; medium purple,Gammaproteobacteria; bright green, Thiothrix; light blue,Actinobacteria; dark blue, Alphaproteobacteria; light brown,Sphingobacteria; dark pink, Deinococci; black,Deltaproteobacteria; dark green, Clostridia; medium pink,Planctomycetacia; dark brown, Spirochaetes; dark purple,Methanomicrobia; light purple, Chromadorea; light pink,Insecta; aquamarin, Mollicutes; grey, unclassified.(Reproduced from Wilmes et al. [2008b] with permissionfrom Nature Publishing Group.) (B) Alignment of variantgenomic fragments against the A. phosphatis compositegenome (outer concentric ring in blue, USJ scaffold numbersindicated, and locations of inserted genes in the alignedgenomic fragments highlighted in red). Relative A.phosphatis protein abundances highlighted according to eachlocus on the first inner concentric ring. Aligned variantgenomic fragments with corresponding protein abundance inthe following concentric rings. Gray color indicates noprotein identification. The image was generated with Circos(M. Krzywinski, http://1477mkweb.bcgsc.ca/circos/).(Reproduced from Wilmes and Bond [2009] with permissionfrom Elsevier.)

630 Chapter 67 Microbial Community Proteomics

Importantly, the generation of characteristic tandemMS fragmentation patterns allows more precise spectralmatching than from single MS alone (for a detaileddescription on strain-resolved proteomics using LC-MS/MS, see VerBerkmoes et al. [2009a]). Additionally,the fragmentation pattern can be employed to generatethe de novo protein sequence and search for homologoussequences using MSBLAST [Shevchenko et al., 2001].De novo peptide sequencing is especially useful forprotein identification when corresponding sequence dataare unavailable [Wilmes and Bond, 2004; Lacerda et al.,2007]. High-mass accuracy spectral matching can beused to positively identify proteins from closely relatedorganisms, but the ability to do so decreases rapidlywith amino acid sequence divergence [Denef et al.,2007]. Consequently, de novo peptide sequencing mayprove indispensable in diverse samples in which thecontribution of distinct microbial populations has to beassessed but for which full DNA sequence coverage isunavailable or unobtainable. Thus, with the advent ofpowerful de novo peptide sequencing algorithms, theimplementation of the approach may become routine.For example, the de novo peptide sequencing approachwas used to identify more than 100 proteins that weredifferentially expressed following exposure of bacterialcommunities to cadmium [Lacerda et al., 2007].

All community proteomic methods face challengesdue to the large complexity of protein species and the largedynamic range of protein levels (see Chapters 66 and 70,Vol. I). This complexity increases for the LC approachas tryptic digestion produces some dozens of peptides perprotein. A technique gaining popularity uses “off-gel” sep-aration by isoelectric focusing (IEF) to divide the peptidemix into many fractions prior to LC-MS/MS [Ye et al.,2007]. This approach has additional benefit if the IEF isperformed on whole proteins instead of peptides, and thisapproach has recently led to the identification of around8000 proteins from a single microbial community sample[Wilmes et al., unpublished data]. This technique relies onthe fact that the protein mixture is simplified in each frac-tion, and this increases the chance of protein identificationby subsequent LC-MS/MS analysis.

67.3 THE STORY SO FAR

The application of community proteomics is expandingrapidly and recent studies directly examine communityfunctional information in a phylogenetic context. Asdiscussed, protein expression profiles based on gel sepa-ration provide overviews for the targeted identification ofinteresting proteins. Apart from activated sludge [Wilmesand Bond, 2004, 2006a; Wilmes et al., 2008a; Wexleret al., 2009], this approach has been applied to an estuary

transect [Kan et al., 2005 discussed further in Chapter68, Vol. I], contaminated soil and groundwater [Lacerdaet al., 2007], Riftia pachyptila endosymbionts [Markertet al., 2007], infant faecal samples [Klaassens et al.,2007], freshwater samples following exposure to heavymetals [Maron et al., 2007a], lake water [Maron et al.,2007b], proteins associated with exopolysaccharides inactivated sludge [Park and Helm, 2008; Park et al.,2008], sheep rumen [Toyoda et al., 2009], and ananaerobic wastewater digestor [Abram et al., 2009].Other studies have employed LC or combinations of geland LC separation. Proteins identified within dissolvedorganic matter (DOM) in soil and water have been usedto determine the presence of broad taxonomic groupsof microorganisms and highlight interesting functionaldifferences between the microbial communities in forestsoil (high abundance of peroxidases) and a peat bog(proteins involved in methanogenesis) [Schulze et al.,2005]. Similarly, proteins that form major constituentsof DOM in seawater were identifiable due to the homol-ogy of identified peptides [Powell et al., 2005]. Due toa distinct lack of comprehensive genomic sequences inthe majority of these studies, the numbers of proteinsidentified were rather limited. Albeit limited, all of thepositive protein identifications did provide interestingnew functional insights.

The availability of extensive genomic sequencesfrom specific organisms or environments of interestgreatly enhances both the number and quality of proteinidentifications. For example, proteins of pronouncedbiogeochemical importance (proteorhodopsins) werereadily identifiable in environmental samples due to theavailability of extensive gene sequences [Giovannoniet al., 2005]. A comprehensive community proteoge-nomic approach involving deep genomic and proteomicsampling was pioneered on acid mine biofilms [Ramet al., 2005; Lo et al., 2007; Denef et al., 2009, 2010](see below for more information). This approach hassince been applied to the enhanced biological phosphorusremoval (EBPR) activated sludge system that is typicallydominated by as yet uncultured organisms putativelynamed “Candidatus Accumulibacter phosphatis” [Wilmeset al., 2008b] (please see below for more information),to human fecal samples [Verberkmoes et al., 2009b],and to a Geobacter-dominated subsurface microbialcommunity involved in uranium bioremediation [Wilkinset al., 2009]. An analogous shotgun proteomic approachhas been applied to microbial communities inhabitingthe euphotic zone of the Sargasso Sea and has led to theidentification of several proteins linked to the dominantorganisms SAR11, Prochlorococcus , and Synechococcusthat are reflective of their lifestyle in this nutrient-limitedenvironment [Sowell et al., 2009]. The power of com-bining genomics and proteomics on communities of

67.3 The Story so Far 631

immediate biotechnological interest (bioenergy) washighlighted by an integrative study of termite hindgutsymbiotic bacteria involved in lignocellulose degradation[Warnecke et al., 2007; see also Chapter 22, Vol II].Furthermore, an integrative high-throughput genomic andproteomic approach has recently been applied to naturalbacterial phyllosphere communities and has highlightedspecific proteomic traits essential for success and coex-istence of two specific bacterial taxa (Sphingomonasand Methylobacterium) on plant leaves [Delmotte et al.,2009].

A full-cycle community proteomic investigationincluding comprehensive extraction, purification, separa-tion, and identification following MS was first appliedto a laboratory-scale activated sludge reactor operatedin the United Kingdom [Wilmes and Bond, 2004]. Inthis study, comparisons of proteome profiles, generatedby 2DE, were made to determine metabolic detailsof the EBPR wastewater treatment process. Proteinexpression was compared between the two operationalstages (anaerobic and aerobic; Fig. 67.3) of EBPR.

However, only minor differences in protein levels weredetected using 2DE (Fig. 67.3a; Wilmes and Bond(2004); Wilmes and Bond (2006b)) which has recentlybeen corroborated using a more sensitive 35S metradiolabeling 2DE approach [Wexler et al., 2009] aswell as a shotgun proteomics approach [Wilmes et al.,2008b] (Fig. 67.3b). Because of lack of differentialprotein expression, the initial community proteomicstudy focused on the identification of highly expressedproteins by de novo peptide sequencing [Wilmes andBond, 2004]. When metagenomic sequences of EBPRsludges became available [Garcıa-Martın et al., 2006], thesequence data facilitated protein identification by analysisof PMF patterns and over 30% of highly expressedproteins chosen from 2DE gels were matched to the dom-inant organism—that is, A. phosphatis [Wilmes et al.,2008a]. Importantly, the majority of identified proteinsis involved in major EBPR carbon transformations. In afurther study, using the reference metagenomic sequencesfrom the EBPR sludges cultured in the United States andAustralia [Garcıa-Martın et al., 2006], the application ofshotgun proteomics provided much deeper insight into

aerobic

75 kDa

50 kDa

37 kDa

25 kDa

20 kDa

anaerobic(A)

pH 4 7

aerobic

pH 4 7

protein density

anaerobic(B)

rela

tive

prot

ein

abun

danc

e

Figure 67.3 Time-resolved communityproteomics highlighting subtle differencesin protein abundance at the end of theanaerobic (120 min) and aerobic(330 min) phases within activated sludgeperforming enhanced biologicalphosphorus removal and dominated byA. phosphatis . (A) Two-dimensionalpolyacrylamide gel electrophoresisprotein expression profiles. (B) Geneexpression dynamics inspector [Eichleret al., 2003] profiles of LC-MS/MSnormalized spectral abundance factordata. Interesting proteins that are slightlymore abundant in the anaerobic phase arehighlighted. Proteins in the highlightedcluster include a phosphate transportregulator (distant homologue of PhoU),an inorganic pyrophosphatase, anacyl-CoA dehydrogenases (fatty acid betaoxidation), and a nucleoside diphosphatekinase (governs the relative levels ofGTP and ATP in cells). (Reproducedfrom Wilmes and Bond [2009] withpermission from Elsevier.)

632 Chapter 67 Microbial Community Proteomics

the metabolic transformations of A. phosphatis in theUK sludge (10% genomically encoded protein coverage[Wilmes et al., 2008b]). Interesting findings from placingidentified proteins into metabolic context include theimportance of denitrification, fatty acid cycling, andthe glyoxylate shunt [Wilmes et al., 2008b]. This studyalso used strain-resolved proteomics to differentiate theexpression of co-occurring protein variants within theEBPR sludge dominated by a A. phosphatis population[Wilmes et al., 2008b] (Fig. 67.2). This revealed that59% of the most abundant protein variants derivedfrom flanking A. phosphatis populations and not fromthe dominant A. phosphatis strain in the sequencedsludges. A significant subset of these was involved incore-metabolism and EBPR-specific pathways, suggestingan essential role for genetic diversity in maintaining thestable performance of microbial-mediated wastewatertreatment. Importantly, Wexler et al. [2009] demonstratedthat anaerobic versus aerobic tricarboxylic acid cycleprotein expression differs according to A. phosphatisstrain-level community composition. Consequently,this study highlights the importance of being able tofunctionally resolve by proteomics the extensive geneticheterogeneity that is a hallmark of natural microbialcommunities [Wilmes et al., 2009a].

The most extensive community proteomic analysesto date have been performed on acid mine biofilms thatexhibit comparatively low diversity [Ram et al., 2005; Loet al., 2007; Denef et al., 2009, 2010]. The RichmondMine is characterized by low pH (∼0.8) and microbiallymediated iron oxidation that contributes to the acid minedrainage production. Here, an initial shotgun proteomicapproach allowed the identification of over 2000 proteins[Ram et al., 2005]. High protein coverage (48%) wasobtained for the dominant microorganism (Leptospirillumgroup II). One highly abundant protein, annotated as ahypothetical, was further investigated and found to be aniron-oxidizing cytochrome (Cyt579), a key component ofenergy generation in the biofilms [Ram et al., 2005]. Thus,the proteomic results were instrumental in guiding theensuing biochemical investigations [Jeans et al., 2008].Lo et al. demonstrated that the high-resolution tandemmass spectrometry could differentiate between peptidesoriginating from discrete populations within the mixedmicrobial community [Lo et al., 2007]. By assigning pep-tides to two different sequenced Leptospirillum group IIpopulations, they inferred the genome architecture of athird unsequenced Leptospirillum group II population anddemonstrated extensive interpopulation recombination [Loet al., 2007]. This approach was expanded to conduct anextensive Leptospirillum group II proteomic genotype sur-vey from 27 distinct biofilm samples [Denef et al., 2009].The protein expression patterns suggest selection for par-ticular recombinant types and revealed that recombination

is a mechanism for fine-scale adaptation [Denef et al.,2009] and divergence [Denef et al., 2010] demonstrat-ing the power of integrated genomics and proteomics tocontribute extensively to our understanding of microbialecology and evolution.

67.4 THE PATH AHEAD

Microbial community proteomics has already providedunprecedented functional insight into microbial commu-nities with limited diversity and/or that are enriched ina particular organismal group. Detection limits for com-munity proteomics suggest that each organism for whicha protein is identified must be present at an abundanceof at least a few percent of the total community. Thisrepresents a major limitation in systems that harbor exten-sive species richness—for example, 106 taxa in a gramof soil which may express 3 × 109 unique proteins (with-out considering strain-level variation; Table 67.1). Furthercomplications include the unevenness of species distribu-tion within samples and the fact that protein expressionlevels within a cell may differ by at least six orders ofmagnitude [Tyers and Mann, 2003]. Consequently, com-prehensive proteomic analyses of mixed communities ischallenging, and with current technology a typical analy-sis may only resolve <<1% of the protein complementwithin diverse samples [Wilmes and Bond, 2006b].

The range of detectable proteins will improve withfuture technical developments in proteomics especiallyadvances in LC and MS. In addition, complex proteomesmay be reduced in complexity prior to LC. Dividing pro-tein complements into many fractions prior to LC, suchas by IEF, holds promise to expand the dynamic rangeextensively [Ye et al., 2007]. Although DNA sequenc-ing technology is currently advancing at an astonishingrate, the implementation of MS-based de novo peptidesequencing will diminish the requirement for comprehen-sive genomic (transcriptomic) foundations and will allowthe identifications of proteins from low abundance com-munity members for which no genomic sequences areavailable.

Due to the ability of being able to infer taxonomic andfunctional information from protein expression data, pro-teomics lends itself ideally to the monitoring of commu-nity structure and function over space and time (Fig. 67.3).For example, changes in protein abundance may be mon-itored between different natural conditions—for example,diurnal or anaerobic/aerobic cycles. Rapid changes withinthe environmental conditions may be manifested at thepost-translational level [Mann and Jensen, 2003], and MSand bioinformatic methodologies need to be refined toresolve these. Another useful approach to detect rapidchanges in protein expression is to observe incorporation

67.5 Conclusion 633

Table 67.1 Microbial Richness and Community Proteomic Expression. Modified and extended from[Wilmes and Bond, 2006b] with permission from Elsevier.

Estimated Number of Estimated Number ofEcosystem Unique Taxaa Unique Proteinsb Reference

Acid mine drainage biofilm 6 18,000 Tyson et al. [2004]Activated sludge 17–268 51,000–800,000 Wagner and Loy [2002]Ocean water 160 480,000 Curtis et al. [2002]Sargasso Sea (combined

sample, i.e., 1730 L)1824–47,733 5,500,000–140,000,000 Venter et al. [2004]

Soil (1 g) 1,000,000 3,000,000,000 Gans et al. [2005]Human

Skin (n = 10) 9–67 27,000–201,000 Grice et al. [2009]Oral cavity (n = 3) 500 1,500,000 Zaura et al. [2009]Throat (n = 6) 204 612,000 Andersson et al. [2008]Healthy stomach (n = 3) 375 1,125,000 Andersson et al. [2008]Feces (gut; n = 6) 385 1,155,000 Andersson et al. [2008]

a As defined by author of referenced work.bNumber of unique proteins based on average environmental microbial genome size of 3 Mbp and 1 kbp of sequence codingfor one gene. Uniqueness does not reflect strain-level variation.

of stable isotope- or radio-label into newly synthesizedproteins [Wexler et al., 2009].

Although 2DE is labor-intensive and has limitationsregarding separation of proteins, it remains convenientfor expression quantification and comparative studies. Forexample, multiple samples differentiated by fluorescenttags (known as DIGE) can be run on the same gels [Lilleyand Dupree, 2006], or metabolically active portions ofcommunities can be detected by incorporation of a labeledsubstrate [Jehmlich et al., 2008].

Given the potential superiority of LC-MS/MSapproaches, it is highly desirable to obtain quantitativeinformation to detect systems level response to change.However, this is not so readily obtained from MSdata, mainly due to the large variation in individualpeptide chemistry. Furthermore, quantification may beconfounded by the complexity of peptides such thatonly subsets of proteins may be identified from asample [Bantscheff et al., 2007]. Nonetheless, emergingtechniques for quantifying proteins from LC-MS/MS datainclude isotope-coded affinity tags (ICAT), metaboliclabeling of proteins (using 13C or 15N), and isobaric tagsfor quantification (iTRAQ) [Ong and Mann, 2005]. Theselabeling techniques allow simultaneous analysis of mul-tiple samples for comparison of the differentially taggedpeptide abundances. Alternative methods to quantifyMS data are commonly used, the so-called “label free”methods [Bantscheff et al., 2007]. A semiquantitativemethod that is commonly used in community proteomics,normalized spectral abundance factor (NSAF) calculation[Florens et al., 2006], relates the number of peptidesidentified from a protein, normalized to the protein size,

to the overall protein abundance. An alternative approachuses the peptide MS peak signal intensity, which is col-lected during the chromatographic spread for each peptide[Ong and Mann, 2005]. The subsequent chromatogrampeak area is then proportional to the peptide’s abundance.These “label free” methods are reportedly not as accurateas labeling approaches for quantification, however, thereis much interest to use these simple approaches andapplication will thus increase as statistical treatmentcontinues to develop [Bantscheff et al., 2007; Choi et al.,2008].

So far, community proteomic studies have beencarried out on bulk samples. However, microbialcommunities exhibit distinct organismal and functionalorganization [Wilmes et al., 2009b] and particular enzymevariants may be localized within distinct microniches[Wilmes et al., 2008b]. Hence, more fine-scale mea-surements will be necessary in the future to resolvethe functional significance of protein localization withinmicrobial communities. Mass spectrometry imagingtechniques [Stoeckli et al., 2001] show great promise forresolving such fine-scale expression differences withinmicrobial communities.

67.5 CONCLUSION

Community proteomics is providing unprecedentedinsight into genotypic and phenotypic traits withinmicrobial consortia. In addition to other systems-leveldata that include genomics (discussed in Chapters14–31), transcriptomics (discussed in Chapters 62–65,

634 Chapter 67 Microbial Community Proteomics

Vol. I), and metabolomics (discussed in Chapters 71–74,Vol. I), proteomics is providing high-resolution moleculardata that are allowing us to glean a more completepicture of microbial community composition, function,interaction, ecology, and evolution. Although numerousstudies reviewed above have solely applied proteomicsto specific mixed microbial communities of interest, theintegration of different “omic” datasets from the samesample will become ever more important in the future todeconvolute and make sense of the complex metabolicprocesses and interactions that occur within complexmicrobial communities [Wilmes et al., 2010]. Suchfundamental knowledge is essential for our understandingof the Earth’s biogeochemical cycles, biotechnologiesthat rely on microbial communities as well as humanhealth.

AcknowledgmentsPW is supported by a Luxembourg National ResearchFund ATTRACT grant (FNR/A09/03).

REFERENCES

Abram F, Gunnigle E, O’Flaherty V. 2009. Optimisation of pro-tein extraction and 2-DE for metaproteomics of microbial commu-nities from anaerobic wastewater treatment biofilms. Electrophoresis30:4149–4151.

Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P,Engstrand L. 2008. Comparative analysis of human gut microbiotaby barcoded pyrosequencing. PLoS ONE 3: e2836.

Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B. 2007.Quantitative mass spectrometry in proteomics: A critical review. Anal.Bioanal. Chem . 389:1017–1031.

Choi H, Fermin D, Nesvizhskii AI. 2008. Significance analysis ofspectral count data in label-free shotgun proteomics. Mol. Cell Pro-teom . 7:2373–2385.

Curtis TP, Sloan WT, Scannell JW. 2002. Estimating prokaryoticdiversity and its limits. Proc. Natl. Acad. Sci. USA 99:10494–10499.

Delmotte N, Knief C, Chaffron S, Innerebner G, Roschitzki B,Schlapbach R, von Mering C, Vorholt JA. 2009. Communityproteogenomics reveals insights into the physiology of phyllospherebacteria. Proc. Natl. Acad. Sci. USA 106:16428–16433.

Denef VJ, Shah MB, VerBerkmoes NC, Hettich RL, Banfield JF.2007. Implications of strain- and species-level sequence divergencefor community and isolate shotgun proteomic analysis. J. ProteomeRes . 6:3152–3161.

Denef VJ, VerBerkmoes NC, Shah MB, Abraham P, Lefsrud M,Hettich RL, Banfield JF. 2009. Proteomics-inferred genome typing(PIGT) demonstrates inter-population recombination as a strategy forenvironmental adaptation. Environ. Microbiol . 11:313–325.

Denef VJ, Kalnejais LH, Mueller RS, Wilmes P, Baker BJ,Thomas BC, Verberkmoes NC, Hettich RL, Banfield JF. 2010.Proteogenomic basis for ecological divergence of closely related bac-teria in natural acidophilic microbial communities. Proc. Natl. Acad.Sci. USA 107:2383–2390.

Ehlers M, Cloete T. 1999. Protein profiles of phosphorus- and nitrate-removing activated sludge systems. Water SA 25:351–356.

Eichler GS, Huang S, Ingber DE. 2003. Gene expression dynam-ics inspector (GEDI): For integrative analysis of expression profiles.Bioinformatics 19:2321–2322.

Eng JK, McCormack AL, Yates I, John R. 1994. An approachto correlate tandem mass spectral data of peptides with aminoacid sequences in a protein database. J. Am. Soc. Mass Spectrom .5:976–989.

Florens L, Carozza MJ, Swanson SK, Fournier M, Coleman MK,Workman JL, Washburn MP. 2006. Analyzing chromatin remod-eling complexes using shotgun proteomics and normalized spectralabundance factors. Methods 40:303–311.

Gans J, Wolinsky M, Dunbar J. 2005. Computational improvementsreveal great bacterial diversity and high metal toxicity in soil. Science309:1387–1390.

Garcıa-Martın H, Ivanova N, Kunin V, Warnecke F, BarryKW, et al. 2006. Metagenomic analysis of two enhanced biologicalphosphorus removal (EBPR) sludge communities. Nat. Biotechnol .24:1263–1269.

Gill SR, Pop M, DeBoy RT, Eckburg PB, Turnbaugh PJ, et al. 2006.Metagenomic analysis of the human distal gut microbiome. Science312:1355–1359.

Giovannoni SJ, Bibbs L, Cho J-C, Stapels MD, Desiderio R, et al.2005. Proteorhodopsin in the ubiquitous marine bacterium SAR11.Nature 438:82–85.

Grice EA, Kong HH, Conlan S, Deming CB, Davis J, et al. 2009.Topographical and temporal diversity of the human skin microbiome.Science 324:1190–1192.

Hunt DF, Yates JR, Shabanowitz J, Winston S, Hauer CR. 1986.Protein sequencing by tandem mass spectrometry. Proc. Natl. Acad.Sci. USA 83:6233–6237.

Jeans C, Singer SW, Chan CS, Verberkmoes NC, Shah M, Het-tich RL, Banfield JF, Thelen MP. 2008. Cytochrome 572 is aconspicuous membrane protein with iron oxidation activity purifieddirectly from a natural acidophilic microbial community. ISME J .2:542–550.

Jehmlich N, Schmidt F, von Bergen M, Richnow H-H, VogtC. 2008. Protein-based stable isotope probing (Protein-SIP) revealsactive species within anoxic mixed cultures. ISME J . 2:1122–1133.

Kan J, Hanson TE, Ginter JM, Wang K, Chen F. 2005. Metapro-teomic analysis of Chesapeake Bay microbial communities. SalineSystems 1: 7.

Klaassens ES, de Vos WM, Vaughan EE. 2007. Metaproteomicsapproach to study the functionality of the microbiota in the humaninfant gastrointestinal tract. Appl. Environ. Microbiol . 73:1388–1392.

Lacerda CMR, Choe LH, Reardon KF. 2007. Metaproteomic analysisof a bacterial community response to cadmium exposure. J. ProteomeRes . 6:1145–1152.

Lilley KS, Dupree P. 2006. Methods of quantitative proteomics andtheir application to plant organelle characterization. J. Exp. Bot .57:1493–1499.

Lo I, Denef VJ, VerBerkmoes NC, Shah MB, Goltsman D, et al..2007. Strain-resolved community proteomics reveals recombininggenomes of acidophilic bacteria. Nature 446:537–541.

Mann M, Jensen ON. 2003. Proteomic analysis of post-translationalmodifications. Nat. Biotechnol . 21:255–261.

Markert S, Arndt C, Felbeck H, Becher D, Sievert SM, et al.2007. Physiological proteomics of the uncultured endosymbiont ofRiftia pachyptila . Science 315:247–250.

Maron P-A, Mougel C, Siblot S, Abbas H, Lemanceau P, Ran-jard L. 2007a. Protein Extraction and Fingerprinting Optimizationof Bacterial Communities in Natural Environment. Microb. Ecol .53:426–434.

Maron P-A, Ranjard L, Mougel C, Lemanceau P. 2007b. Metapro-teomics: A new approach for studying functional microbial ecology.Microb. Ecol . 53:486–493.

References 635

Garcıa-Martın H, Ivanova N, Kunin V, Warnecke F, Barry KW,McHardy AC, Yeates C, He S, Salamov AA, Szeto E, Dalin E,Putnam NH, Rigoutsos I, Kyrpides NC, Blackall LL, McMahonKD, Hugenholtz P. 2006. Metagenomic analysis of two enhancedbiological phosphorus removal (EBPR) sludge communities. NatureBiotechnology . 24:1263–1269.

McHardy AC, Garcıa-Martın HG, Tsirigos A, Hugenholtz P,Rigoutsos I. 2007. Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4:63–72.

Ogunseitan OA. 1993. Direct extraction of proteins from environmen-tal samples. J. Microbiol. Methods 17:273–281.

Ong S-E, Mann M. 2005. Mass spectrometry-based proteomics turnsquantitative. Nat. Chem. Biol . 1:252–262.

Park C, Helm R. 2008. Application of metaproteomic analysisfor studying extracellular polymeric substances (EPS) in activatedsludge flocs and their fate in sludge digestion. Water Sci. Technol .57:2009–2015.

Park C, Novak JT, Helm RF, Ahn Y-O, Esen A. 2008. Evaluation ofthe extracellular proteins in full-scale activated sludges. Water Res .42:3879–3889.

Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. 1999.Probability-based protein identification by searching sequencedatabases using mass spectrometry data. Electrophoresis 20:3551–3567.

Powell MJ, Sutton JN, Del Castillo CE, Timperman AT. 2005.Marine proteomics: Generation of sequence tags for dissolved pro-teins in seawater using tandem mass spectrometry. Mar. Chem .95:183–198.

Ram RJ, VerBerkmoes NC, Thelen MP, Tyson GW, BakerBJ, Blake RC, II, Shah M, Hettich RL, Banfield JF. 2005.Community proteomics of a natural microbial biofilm. Science308:1915–1920.

Santoni V, Molloy M, Rabilloud T. 2000. Membrane proteins andproteomics: Un amour impossible? Electrophoresis 21:1054–1070.

Schulze W, Gleixner G, Kaiser K, Guggenberger G, Mann M,Schulze ED. 2005. A proteomic fingerprint of dissolved organiccarbon and soil particles. Oecologia 142:335–343.

Shevchenko A, Sunyaev S, Loboda A, Shevchenko A, Bork P,Ens W, Standing KG. 2001. Charting the proteomes of organ-isms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal.Chem . 73:1917–1926.

Sowell SM, Wilhelm LJ, Norbeck AD, Lipton MS, Nicora CD,et al. 2009. Transport functions dominate the SAR11 metaproteomeat low-nutrient extremes in the Sargasso Sea. ISME J . 3:93–105.

Stoeckli M, Chaurand P, Hallahan DE, Caprioli RM. 2001. Imag-ing mass spectrometry: A new technology for the analysis of proteinexpression in mammalian tissues. Nat. Med . 7:493–496.

Toyoda A, Iio W, Mitsumori M, Minato H. 2009. Isolation and iden-tification of cellulose-binding proteins from sheep rumen contents.Appl. Environ. Microbiol . 75:1667–1673.

Tyers M, Mann M. 2003. From genomics to proteomics. Nat. Biotech-nol . 422:193–197.

Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al.2004. Community structure and metabolism through reconstructionof microbial genomes from the environment. Nature 428:37–43.

Ultsch A, Moerchen F. 2005. ESOM-Maps: Tools for clustering,visualization, and classification with emergent SOM. In TechnicalReport of the Department of Mathematics and Computer Science,University of Marburg, Germany, Vol. 46, pp. 1–7.

Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D,et al. 2004. Environmental genome shotgun sequencing of the Sar-gasso Sea. Science 304:66–74.

VerBerkmoes NC, Denef VJ, Hettich RL, Banfield JF. 2009a. Sys-tems biology: Functional analysis of natural microbial consortia usingcommunity proteomics. Nat. Rev. Microbiol . 7:196–205.

Verberkmoes NC, Russell AL, Shah M, Godzik A, RosenquistM, et al. 2009b. Shotgun metaproteomics of the human distal gutmicrobiota. ISME J . 3:179–189.

Wagner M, Loy A. 2002. Bacterial community composition andfunction in sewage treatment systems. Curr. Opin. Biotechnol .13:218–227.

Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richard-son TH, et al. 2007. Metagenomic and functional analysis ofhindgut microbiota of a wood-feeding higher termite. Nature 450:560–565.

Wexler M, Richardson DJ, Bond PL. 2009. Radiolabelled pro-teomics to determine differential functioning of Accumulibacter dur-ing the anaerobic and aerobic phases of a bioreactor operating forenhanced biological phosphorus removal. Environ. Microbiol . 11:3029–3044.

Wilkins MJ, Verberkmoes NC, Williams KH, Callister SJ,Mouser PJ, et al. 2009. Proteogenomic monitoring of Geobacterphysiology during stimulated uranium bioremediation. Appl. Environ.Microbiol . 75:6591–6599.

Wilmes P, Bond PL. 2004. The application of two-dimensionalpolyacrylamide gel electrophoresis and downstream analyses to amixed community of prokaryotic microorganisms. Environ. Micro-biol . 6:911–920.

Wilmes P, Bond PL. 2006a. Towards exposure of elusive metabolicmixed-culture processes: The application of metaproteomic analysesto activated sludge. Water Sci. Technol . 54:217–226.

Wilmes P, Bond PL. 2006b. Metaproteomics: Studying functional geneexpression in microbial ecosystems. Trends Microbiol . 14:92–97.

Wilmes P, Bond PL. 2009. Microbial community proteomics: Eluci-dating the catalysts and metabolic mechanisms that drive the Earth’sbiogeochemical cycles. Curr. Opin. Microbiol . 12:310–317.

Wilmes P, Wexler M, Bond PL. 2008a. Metaproteomics providesfunctional insight into activated sludge wastewater treatment. PLoSONE 3: e1778.

Wilmes P, Andersson AF, Lefsrud MG, Wexler M, ShahM, et al. 2008b. Community proteogenomics highlights micro-bial strain-variant protein expression within activated sludge per-forming enhanced biological phosphorus removal. ISME J . 2:853–864.

Wilmes P, Simmons SL, Denef VJ, Banfield JF. 2009a. The dynamicgenetic repertoire of microbial communities. FEMS Microbiol. Rev .33:109–132.

Wilmes P, Remis JP, Hwang M, Auer M, Thelen MP, BanfieldJF. 2009b. Natural acidophilic biofilm communities reflect distinctorganismal and functional organization. ISME J . 3:266–270.

Wu CC, MacCoss MJ, Howell KE, Yates JR. 2003. A method forthe comprehensive proteomic analysis of membrane proteins. Nat.Biotechnol . 21:532–538.

Ye M, Jiang X, Feng S, Tian R, Zou H. 2007. Advances in chro-matographic techniques and methods in shotgun proteome analysis.Trends Anal. Chem . 26:80–84.

Zaura E, Keijser B, Huse S, Crielaard W. 2009. Defining the healthy“core microbiome” of oral microbial communities. BMC Microbiol .9: 259.