6
Comparative systems biology across an evolutionary gradient within the Shewanella genus Konstantinos T. Konstantinidis a,1,2 , Margrethe H. Serres b,1 , Margaret F. Romine c,1 , Jorge L. M. Rodrigues d,1 , Jennifer Auchtung e , Lee-Ann McCue c , Mary S. Lipton c , Anna Obraztsova f , Carol S. Giometti g , Kenneth H. Nealson f , James K. Fredrickson c , and James M. Tiedje e,2 a School of Civil and Environmental Engineering and School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30332; b Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543; c Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999 , Richland, WA 99352; d Department of Biology, University of Texas, P.O. Box 19498, Arlington, TX 76019; e Center for Microbial Ecology, Michigan State University, 540 Plant and Soil Sciences Building, East Lansing, MI 48824; f Department of Earth Sciences, University of Southern California, University Park Campus, Los Angeles, CA 90089; and g Biosciences Division, Argonne National Laboratory, 9700 South Cass Avenue, Building 202, Argonne, IL 60439 Edited by Jeffrey I. Gordon, Washington University School of Medicine, St. Louis, MO, and approved July 29, 2009 (received for review February 25, 2009) To what extent genotypic differences translate to phenotypic variation remains a poorly understood issue of paramount impor- tance for several cornerstone concepts of microbiology including the species definition. Here, we take advantage of the completed genomic sequences, expressed proteomic profiles, and physiolog- ical studies of 10 closely related Shewanella strains and species to provide quantitative insights into this issue. Our analyses revealed that, despite extensive horizontal gene transfer within these genomes, the genotypic and phenotypic similarities among the organisms were generally predictable from their evolutionary relatedness. The power of the predictions depended on the degree of ecological specialization of the organisms evaluated. Using the gradient of evolutionary relatedness formed by these genomes, we were able to partly isolate the effect of ecology from that of evolutionary divergence and to rank the different cellular func- tions in terms of their rates of evolution. Our ranking also revealed that whole-cell protein expression differences among these organ- isms, when the organisms were grown under identical conditions, were relatively larger than differences at the genome level, sug- gesting that similarity in gene regulation and expression should constitute another important parameter for (new) species descrip- tion. Collectively, our results provide important new information toward beginning a systems-level understanding of bacterial spe- cies and genera. comparative genomics evolution proteomics speciation phenotype P redicting the phenotype of newly isolated organisms based upon the existing knowledge of previously characterized organisms constitutes one of the most fundamental goals of microbiology. Organisms isolated from diverse environments and habitats often have their phenotypic and physiological properties inferred from their evolutionary relatedness, mea- sured by (mainly) the 16S rRNA gene sequence identity or other means (1, 2), to the type strains of known species. Although this practice has been broadly applied in studies of microbial com- munities, contributing greatly toward advancing microbiology knowledge, its use in this manner is rooted in rather low- resolution experimental methods and procedures (1, 3). The powerful genomic tools now available provide the opportunity for a much more detailed and informative evaluation of the relationship between genetic and phenotypic similarity. Simple questions that remain unanswered or only partially explored, such as to what degree do microorganisms encode and express the same metabolic pathways when grown under identical con- ditions, and to what extent are the similarities in expressed pathways determined by the genetic relatedness and/or the (distinct) ecological adaptations of the microorganisms, can now be answered accurately and quantitatively. Addressing such questions will provide long-needed information to better under- stand and to model the enormous microbial biodiversity that exists on the planet. To this end, we have analyzed and compared, both at the whole-genome and the whole-proteome levels, 10 isolates be- longing to the genus Shewanella, an important genus in cycling of organic and inorganic materials in the environment (4). These isolates originated from diverse geographic locations and habi- tats, including fresh and marine water columns, sediments, and subsurface environments (Fig. 1A and Table S1), and carry out a diverse range of metabolic processes (4). Although precise ecological information, e.g., in situ abundance and persistence in time, about each isolate is typically not available, the procedure used to isolate these strains, i.e., enrichment cultures from a variety of environmental samples for the phenotype or genotype of interest, is similar to common microbiology practice. Accord- ingly, our analyses with the Shewanella strains should be relevant for the questions described above and for broadening our understanding of the interrelationship between genotype, phe- notype, environment, and evolution. Our results represent the first thorough and system-level assessment of an environmental representative of Proteobacteria, an enormously diverse and important group, that can be compared and contrasted to previous assessments of the heavily sampled human pathogens or the ecologically specialized organisms such as the photosyn- thetic Prochlorococcus (5). Such comparisons identified several trends that may apply to other environmentally versatile bacteria besides Shewanella. A Continuous Genetic Gradient Within a Genus. Phylogenetic analysis of the 16S rRNA gene sequences revealed that the 10 Shewanella isolates formed a tight cluster, with the intra-cluster sequence identity ranging from 92 to 100% (Fig. 1B). Hence, these genomes belong justifiably to the same genus according to the most frequently used standards of bacterial taxonomy (2, 6). To gain further insight into the diversity of this group, the average nucleotide identity (ANI) of all pairwise conserved genes be- tween (any) two genomes, a more sensitive parameter for measuring evolutionary relatedness among closely related ge- Author contributions: K.T.K., M.H.S., M.F.R., J.L.M.R., K.H.N., J.K.F., and J.M.T. designed research; K.T.K., M.H.S., M.F.R., J.L.M.R., J.A., L.-A.M., M.S.L., A.O., and C.S.G. performed research; A.O., C.S.G., and K.H.N. contributed new reagents/analytic tools; K.T.K., M.H.S., M.F.R., J.L.M.R., J.A., L.-A.M., and M.S.L. analyzed data; and K.T.K., M.H.S., M.F.R., J.L.M.R., J.K.F., and J.M.T. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 K.T.K, M.H.S., M.F.R., and J.L.M.R contributed equally to this work. 2 To whom correspondence may be addressed. E-mail: [email protected] or [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0902000106/DCSupplemental. www.pnas.orgcgidoi10.1073pnas.0902000106 PNAS September 15, 2009 vol. 106 no. 37 15909 –15914 MICROBIOLOGY Downloaded by guest on November 30, 2021

Comparative systems biology across an evolutionary

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparative systems biology across an evolutionary

Comparative systems biology across an evolutionarygradient within the Shewanella genusKonstantinos T. Konstantinidisa,1,2, Margrethe H. Serresb,1, Margaret F. Rominec,1, Jorge L. M. Rodriguesd,1,Jennifer Auchtunge, Lee-Ann McCuec, Mary S. Liptonc, Anna Obraztsovaf, Carol S. Giomettig, Kenneth H. Nealsonf,James K. Fredricksonc, and James M. Tiedjee,2

aSchool of Civil and Environmental Engineering and School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30332; bJosephine BayPaul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543; cBiological SciencesDivision, Pacific Northwest National Laboratory, P.O. Box 999 , Richland, WA 99352; dDepartment of Biology, University of Texas, P.O. Box 19498,Arlington, TX 76019; eCenter for Microbial Ecology, Michigan State University, 540 Plant and Soil Sciences Building, East Lansing, MI 48824;fDepartment of Earth Sciences, University of Southern California, University Park Campus, Los Angeles, CA 90089; and gBiosciences Division,Argonne National Laboratory, 9700 South Cass Avenue, Building 202, Argonne, IL 60439

Edited by Jeffrey I. Gordon, Washington University School of Medicine, St. Louis, MO, and approved July 29, 2009 (received for review February 25, 2009)

To what extent genotypic differences translate to phenotypicvariation remains a poorly understood issue of paramount impor-tance for several cornerstone concepts of microbiology includingthe species definition. Here, we take advantage of the completedgenomic sequences, expressed proteomic profiles, and physiolog-ical studies of 10 closely related Shewanella strains and species toprovide quantitative insights into this issue. Our analyses revealedthat, despite extensive horizontal gene transfer within thesegenomes, the genotypic and phenotypic similarities among theorganisms were generally predictable from their evolutionaryrelatedness. The power of the predictions depended on the degreeof ecological specialization of the organisms evaluated. Using thegradient of evolutionary relatedness formed by these genomes,we were able to partly isolate the effect of ecology from that ofevolutionary divergence and to rank the different cellular func-tions in terms of their rates of evolution. Our ranking also revealedthat whole-cell protein expression differences among these organ-isms, when the organisms were grown under identical conditions,were relatively larger than differences at the genome level, sug-gesting that similarity in gene regulation and expression shouldconstitute another important parameter for (new) species descrip-tion. Collectively, our results provide important new informationtoward beginning a systems-level understanding of bacterial spe-cies and genera.

comparative genomics � evolution � proteomics � speciation � phenotype

Predicting the phenotype of newly isolated organisms basedupon the existing knowledge of previously characterized

organisms constitutes one of the most fundamental goals ofmicrobiology. Organisms isolated from diverse environmentsand habitats often have their phenotypic and physiologicalproperties inferred from their evolutionary relatedness, mea-sured by (mainly) the 16S rRNA gene sequence identity or othermeans (1, 2), to the type strains of known species. Although thispractice has been broadly applied in studies of microbial com-munities, contributing greatly toward advancing microbiologyknowledge, its use in this manner is rooted in rather low-resolution experimental methods and procedures (1, 3). Thepowerful genomic tools now available provide the opportunityfor a much more detailed and informative evaluation of therelationship between genetic and phenotypic similarity. Simplequestions that remain unanswered or only partially explored,such as to what degree do microorganisms encode and expressthe same metabolic pathways when grown under identical con-ditions, and to what extent are the similarities in expressedpathways determined by the genetic relatedness and/or the(distinct) ecological adaptations of the microorganisms, can nowbe answered accurately and quantitatively. Addressing suchquestions will provide long-needed information to better under-

stand and to model the enormous microbial biodiversity thatexists on the planet.

To this end, we have analyzed and compared, both at thewhole-genome and the whole-proteome levels, 10 isolates be-longing to the genus Shewanella, an important genus in cyclingof organic and inorganic materials in the environment (4). Theseisolates originated from diverse geographic locations and habi-tats, including fresh and marine water columns, sediments, andsubsurface environments (Fig. 1A and Table S1), and carry outa diverse range of metabolic processes (4). Although preciseecological information, e.g., in situ abundance and persistence intime, about each isolate is typically not available, the procedureused to isolate these strains, i.e., enrichment cultures from avariety of environmental samples for the phenotype or genotypeof interest, is similar to common microbiology practice. Accord-ingly, our analyses with the Shewanella strains should be relevantfor the questions described above and for broadening ourunderstanding of the interrelationship between genotype, phe-notype, environment, and evolution. Our results represent thefirst thorough and system-level assessment of an environmentalrepresentative of Proteobacteria, an enormously diverse andimportant group, that can be compared and contrasted toprevious assessments of the heavily sampled human pathogensor the ecologically specialized organisms such as the photosyn-thetic Prochlorococcus (5). Such comparisons identified severaltrends that may apply to other environmentally versatile bacteriabesides Shewanella.

A Continuous Genetic Gradient Within a Genus. Phylogenetic analysisof the 16S rRNA gene sequences revealed that the 10 Shewanellaisolates formed a tight cluster, with the intra-cluster sequenceidentity ranging from 92 to �100% (Fig. 1B). Hence, thesegenomes belong justifiably to the same genus according to themost frequently used standards of bacterial taxonomy (2, 6). Togain further insight into the diversity of this group, the averagenucleotide identity (ANI) of all pairwise conserved genes be-tween (any) two genomes, a more sensitive parameter formeasuring evolutionary relatedness among closely related ge-

Author contributions: K.T.K., M.H.S., M.F.R., J.L.M.R., K.H.N., J.K.F., and J.M.T. designedresearch; K.T.K., M.H.S., M.F.R., J.L.M.R., J.A., L.-A.M., M.S.L., A.O., and C.S.G. performedresearch; A.O., C.S.G., and K.H.N. contributed new reagents/analytic tools; K.T.K., M.H.S.,M.F.R., J.L.M.R., J.A., L.-A.M., and M.S.L. analyzed data; and K.T.K., M.H.S., M.F.R., J.L.M.R.,J.K.F., and J.M.T. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

1K.T.K, M.H.S., M.F.R., and J.L.M.R contributed equally to this work.

2To whom correspondence may be addressed. E-mail: [email protected] [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0902000106/DCSupplemental.

www.pnas.org�cgi�doi�10.1073�pnas.0902000106 PNAS � September 15, 2009 � vol. 106 � no. 37 � 15909–15914

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1

Page 2: Comparative systems biology across an evolutionary

nomes than the 16S rRNA gene (7), was used. The ANI analysisrevealed that these genomes form a continuing gradient ofgenetic relatedness, which was not readily apparent from the 16SrRNA gene analysis (Fig. 1C). In particular, S. putrefaciensstrains W3–18-1 and CN-32 as well as Shewanella sp. MR-4 andMR-7 are the most closely related pairs, showing ANI values of�96.5% and �98.4%, respectively. These values are well abovethe 95% ANI that corresponds to the 70% DNA-DNA hybrid-ization (DDH) standard frequently used for species demarca-tion, which is consistent with the experimentally derived DDHvalues for these organisms (6). Hence, these pairs of genomessample the subspecies level. The MR-4 and MR-7 genomes show�92%, �85%, and �79% ANI to Shewanella sp. ANA-3, S.putrefaciens CN-32, and S. oneidensis MR-1 genomes, respec-tively. Thus, these genome pairs represent varied levels of geneticrelatedness within the Shewanella genus. Finally, all of thepreviously mentioned genomes show �69.7–72% ANI to S.frigidimarina NCIMB400, S. denitrificans OS217, S. loihica PV-4,and S. amazonensis SB2B strains, which represent the four mostdivergent species sampled within the genus. This gradient pro-vided the opportunity to precisely estimate the number of

changes in the genes, pathways, and subsystems of the cell overtime and as a result of environmental adaptations and selectionpressures.

Gene Content Variation as a Function of Evolutionary Time andEcology. The 10 Shewanella isolates have similar genome sizes,varying from 4.3 to 5.3 Mbp (Table S1). Comparative analysisrevealed extensive gene content diversity among the genomes.From the 9782 predicted nonredundant (orthologous genesremoved) protein-coding sequences (CDS) annotated in the 10genomic sequences (the pangenome), only �2128 (22%, consti-tuting �54% of the total genes in the genome, on average) werepresent in all genomes (core CDS set); �2965 (30%) were foundin at least two genomes (variable CDSs), whereas the remainingCDS (4689 or 48%) were strain specific (Fig. 2B, Fig. S1, andTable S2). Nonetheless, the majority of the variable CDSs werefound to be specific to clades, i.e., the MR or S. putrefaciensclades (Fig. 1B), whereas a smaller fraction had a more sporadicdistribution among the strains (note the similarity between thegene content tree and the phylogeny of the genomes in Fig. 3).Accordingly, the overall extent of CDS content similarity showed

Fig. 1. The 10 Shewanella genomes used in this study and their evolutionary gradient. The geographic origin (A) and the 16S rRNA-based phylogenetic tree(B) of the 10 genomes (in boldface type) are shown. The scale represents the number of substitutions per position, and the numbers above and below the nodesrepresent the bootstrap support from 1000 resamplings using parsimony and maximum likelihood methods, respectively. Bootstrap values �50 were omitted.A continuous genetic gradient was formed (C) when the fraction of the total genes in the genome shared between two genomes (y axis) was plotted againstthe ANI of the shared genes between the two genomes (45 comparisons in total are shown). Dashed blue lines represent the 90% prediction intervals of theregression line; open squares identify the outlier pairs of genomes observed (discussed in the text).

Fig. 2. The Shewanella pangenome. (A) Contribution of different categories of genes to the pangenome as a function of ANI. The genes that differed in allpairwise whole-genome comparisons among the 10 Shewanella genomes (45 comparisons in total) were assigned to five major functional categories (graphlegend). The number of genes in each category, expressed as a fraction of the total genes that differed between the two genomes (y axis), is plotted againstthe genomic ANI value of the two genomes compared. Individual data points representing each comparison have been removed for clarity; trendlinesrepresenting the mean, and bars representing one standard deviation from the mean, are shown instead. (B) Comparisons with the enterics pangenome. Thenumber of genes that remained conserved (y axis) with the inclusion of more genomes in the analysis is plotted against the number of genomes (x axis) used(light colors). The total number of nonredundant unique genes in all genomes used is also shown (dark colors). Bars represent one standard deviation based on10 random combinations in adding the genomes to the analysis.

15910 � www.pnas.org�cgi�doi�10.1073�pnas.0902000106 Konstantinidis et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1

Page 3: Comparative systems biology across an evolutionary

a very strong linear decrease with increasing evolutionary dis-tance between the genomes compared (R2 � 0.9; Fig. 1C), whichis consistent with results reported previously based on otherbacterial groups (7). The strong linear trend suggests that,despite the extensive gene diversity and apparent genome flu-idity, the genotypic similarity of bacteria may be generallypredictable from their evolutionary relatedness.

Although a tight relationship between shared CDS contentand evolutionary relatedness was observed, several significantdepartures (outliers) from this main trend were also noted andwere most likely attributable to ecological adaptations. Forinstance, the two most closely related genomes based on ANI,CN-32, and W3–18-1 (98.4% ANI) showed substantially moreCDS content differences compared with what was expectedbased on their small evolutionary divergence (see regressiontrendline in Fig. 1C) or compared with the more distantly related(96.4% ANI) pair of MR-7 and MR-4 (�530 vs. �430 CDSs,respectively, not counting CDS on mobile elements; Table S2).CN-32 and W3–18-1 were isolated from more diverse environ-ments (deep-subsurface sandstone vs. marine sediment, respec-tively) compared with MR-4 and MR-7 (5-m vs. 60-m depth inthe Black Sea, respectively). Hence, it is likely that geneticadaptations specific to these environments account for the largergene content differences observed in the former strains relativeto the latter ones. In agreement with the latter interpretation,CN-32-specific genes included several genes that might beimportant for survival in the subsurface environment, such as anarsenate reductase, copper resistance system, heavy metal eff luxpump, and a polysaccharide biosynthesis cluster.

Similarly, S. denitrificans strain OS217 is as divergent as threeother isolates (strains PV-4, NCIMB400, and SB2B) are from theremaining six Shewanella isolates in our collection (e.g., Fig. 3D).Yet, the OS217 genome contained substantially more strain-specific genes and showed the greatest loss of ‘‘core-like’’ CDSs(i.e., CDSs present in all other Shewanella genomes) comparedwith the genomes of PV-4, NCIMB400, or SB2B (Table S2). Forinstance, the core set increased by 265 genes when OS217 wasremoved from the analysis compared with �60 genes when PV-4,NCIMB400 or and SB2B were individually removed. Ourgenomic, physiological (e.g., Table S3), and proteomic datacollectively suggest that strain OS217 has undertaken a uniqueevolutionary path, possibly driven by the loss of the threemenaquinone biosynthetic gene clusters (menDHCE, menF,menB) common to the other Shewanella strains and resulting inan inability to exploit strictly anaerobic habitats. These resultsare consistent with previous findings suggesting that strainOS217 is a specialized denitrifier (4) and are also consistent withthe longstanding observation that respiratory denitrification isnot found in organisms that are strong fermentors (8). Thesefindings may indicate that more extensive genetic changes areinvolved for an organism to diverge to the opposite physiology.Finally, the (outlier) pairs of genomes with a higher percentageof shared genes than the average, i.e., CN-32 or W3–18-1 vs.MR-4 or MR-7 (Fig. 1C), are attributable to the substantiallysmaller size of these genomes (i.e., 4.6–4.7 Mbp) relative to thatof the rest of the genomes (i.e., �5.2 Mbp; Table S1) rather thanto more similar ecological adaptations (the number of sharedorthologs and mobile gene content in these pairs is comparableto that of other pairs).

Processes Contributing to Gene Content Variation. To provide fur-ther quantitative insights into the processes contributing to genecontent variation, the genes that differed in pairwise whole-genome comparisons were assigned to five major functionalcategories and the percentage of genes in each category wasevaluated against the genetic relatedness of the two genomescompared. The five categories were as follow: (i) pseudogenes,denoting genes predicted to encode insertions, deletions, orsequence alterations that would result in premature terminationof the encoded protein; (ii) IS/Tn, denoting insertion sequencesor transposons; (iii) mobile islands, denoting runs of neighboringgenes (genomic islands) that included integrase genes; (iv) other,denoting all other unique genes, including genomic islands thatdo not contain clear evidence of being mobile; and (v) hypo-thetical or conserved hypothetical, denoting the fraction of thegenes in category iv that had no detectable homolog in any of thefully sequenced genomes except in other Shewanella genomes(Table S4). Our results revealed that mobile islands and insertionelements dominated the gene content differences among ge-nomes of the same species, but their contribution graduallydecreased in comparisons among genomes of increasing evolu-tionary divergence at the expense of genes in the ‘‘other’’category (Fig. 2 A). These findings are consistent with rapidturnover of mobile islands over short evolutionary scales. Fur-thermore, the majority (�75%) of the genes in the ‘‘other’’category were typically found in clusters of �5–40 genes,reflecting presumably their ‘‘mobile island’’ origin. These find-ings are consistent with preferential deletion of the mobility/transposition genes (presumably because of negative selection)in the course of evolution and retention of only the potentiallyecologically important genes of mobile islands. Therefore, theShewanella organisms evaluated here appear to acquire most oftheir new functions as follows: acquisition of mobile islands,followed by selection for the islands carrying ecologically im-portant genes, and finally loss of the mobile and ecologicallyunimportant genes.

Fig. 3. Genome vs. proteome comparisons among nine Shewanella strains.The protein profiles of nine Shewanella strains were compared based on the2128 core genes (A) and the 4300 genes found in the genome of strain MR-1(B) for gene expression, and the nine strains were subsequently clusteredbased on their overall similarity in the expression patterns of these two genesets as follows. For each gene set, a full (all genes by all genomes) 0/1 matrixwas built, with ‘‘1’’ denoting expression (defined as the detection of at leasttwo unique peptides per protein) and ‘‘0’’ denoting no expression of thecorresponding protein; the derived matrices were clustered as described in theSI Materials and Methods and the resulting cladograms are shown. Similarly,the nine strains were also clustered based on the presence/absence of the 4300MR-1 gene orthologs in their genome (sequence comparisons, C). A maximum-likelihood phylogenetic tree of the concatenated alignment of 1507 single-copy core genes that had no detectable signal for recombination by Phi testanalysis (22) is also shown (D). Scale bars represent percent similarity in thederived matrices for (A), (B), and (C); and number of substitution per sitefor (D).

Konstantinidis et al. PNAS � September 15, 2009 � vol. 106 � no. 37 � 15911

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1

Page 4: Comparative systems biology across an evolutionary

The Shewanella Pangenome and Conserved Gene Core. Comparativeanalysis of the 10 Shewanella genomic sequences revealed thatsampling of the genus pangenome remained unsaturated (Fig.2B, blue bars); this result was attributable to the large number(468, on average) of strain-specific genes. Only 10–25% of thelatter genes, depending on the genome evaluated, found ahomolog in a genome outside the Shewanella genus whenqueried against all bacterial genomes available at the end of2008, indicating the great potential for discovering novel geneswith more Shewanella strains sequenced. The number of newgenes per genome is an order of magnitude higher than thosecalculated for highly specialized human pathogens (9) but sig-nificantly lower than that of the opportunistic pathogen Esche-richia coli (7). It must be pointed out, however, that thesepan-genome calculations are not directly comparable and shouldbe interpreted with caution. For instance, the average ANI valueamong all pairs of Shewanella genomes is �76%, which issignificantly lower than that within the E. coli group (�96%),and there appears to be a strong positive correlation between thenumber of novel genes carried in a genome and the (higher)degree of evolutionary divergence of the genome, regardless ofthe effect of ecology or environmental adaptation (Fig. 1C) (7).On the other hand, the prophage content of the E. coli genomesis substantially higher than that of the Shewanella genomes(10–20% vs. 0–5%, respectively), and this accounts for much ofthe difference observed. When the groups were adjusted forcomparable intragroup diversity, by including selected Salmo-nella (�82% ANI to E. coli) and Yersinia (�72% ANI to E. coli)genomes together with E. coli genomes and with prophagegenomes removed from the analysis, the gene diversity observedwithin the enterics was comparable to that of the Shewanella(Fig. 2B). Therefore, the evaluation of these two importantgroups suggests that sequencing of any new organism, as long asthe organism belongs to a versatile genus and has a differentecological history relative to the previously sequenced membersof the genus, should be expected to expand substantially thepangenome of the genus.

Both the Shewanella and the enterics core gene sets werehighly enriched in translational, transcriptional, DNA replica-tion, and central metabolism genes and overlapped extensively(�50% of the genes were shared between the two cores).Shewanella-specific core functions were associated mainly withmetabolic pathways as well as with chemotaxis and sensory-transduction processes. Using the BioCyc pathway schema (10),104 pathways were identified as being common to all Shewanellagenomes, including pathways for energy metabolism, synthesis ofbuilding blocks (amino acids, cofactors, fatty acids, and nucle-otides), and degradation or interconversion of metabolites andall but two amino acids and metabolites (Fig. S2 and Table S5).A common trait of the Shewanella strains appears to be the useof the pentose phosphate and Entner-Doudoroff pathways forhexose degradation. This is based on the lack of the enzyme6-phosphofructokinase (Pfk; the most important regulatoryenzyme of the canonical glycolysis pathway), initially observed inprevious gene expression studies of MR-1 cultures (11). Mem-bers of the Shewanella genus also have fewer phosphotransferasesystem (PTS) transporters than usually encountered in pro-teobacterial genomes. Whether there is a connection betweenthe reduced PTS and lack of Pfk is not clear, but it is possible thatthe lower level of phosphoenolpyruvate (PEP) synthesized as aresult of not using the glycolytic pathway may render thePEP-dependent PTS system inefficient.

When the core was defined as the genes present in all but oneof the 10 genomes, the dataset increased by 411 protein codinggenes (265 when OS217 was excluded from the analysis), cor-responding to, on average, 12–14% of the Shewanella genome(Table S2). These findings suggest that gene loss, including lossof genes that are apparently indispensable for the majority of the

strains of a species, might be a successful strategy for rapidevolution and environmental adaptation. A representative ex-ample of strain-specific adaptations related to a group corefunction, which involved considerable gene deletion and/or geneacquisition, is given below. All Shewanella strains except for S.denitrificans OS217, which shows limited anaerobic growth ca-pabilities presumably because of gene loss during the process ofecological specialization (discussed above), were able to reduceseveral metals and metalloids (Table S3), a well-known charac-teristic of the genus (12). The main metal reductase locus,encoded by mtrCAB genes, is virtually identical for the ninestrains but the adjacent loci vary, reflecting evolutionary historyand possibly metal respiratory specialization (4). These dissim-ilarities explained some, but not all, of the variation in metalrespiration among strains observed during our growth experi-ments. For example, although their mtr locus and flanking genesare identical, strain CN-32 was able to grow on lactate (20 mM)when six different metals or metalloids were used as electronacceptors, whereas strain W3–18-1 grew only with Fe, Mn, andSe, under the conditions tested. These results may reflectdifferences in the upstream pathways to metal reduction be-tween the two strains and underscore the need for more researchto understand better the details of the metal respiration cascade.

Gene Presence vs. Expression as a Function of Time and Ecology.Transcriptome comparisons have shown that gene expressionrather than gene content differences, occurring either at differ-ent times and/or in different tissues, are mainly responsible forthe differential development of eukaryotic organisms, e.g., hu-man and chimpanzees (13), and the adaptive evolution of naturalpopulations (14). It follows that, in addition to the number ofshared genes, gene expression constitutes an important factordetermining phenotypic similarity (or dissimilarity). Althoughthe latter applies presumably to bacteria as well, systematicassessments of the role of gene expression on the phenotypicdifferences observed among closely related organisms arelacking.

To begin exploring this issue, the 10 strains were grown underidentical batch-culture conditions to obtain their whole-cellproteome profiles and to contrast the profiles against theevolutionary relatedness among the strains. Overall, the degreeof similarity in proteome profiles was congruent with the evo-lutionary relatedness among the strains; i.e., the fraction oforthologous proteins detected to be expressed in the cultures washigher in closely related strains than in more divergent strains.However, the differences in expressed proteins among the strainswere consistently larger than their differences at the genecontent level when gene expression and gene content wereassessed for the same 4300 (reference) genes found in the MR-1genome (compare branch lengths in Fig. 3B and 3C), whichminimized the effect of gene- or strain- specific variations in themeasurements. More surprisingly, the same pattern was ob-served even when gene expression was assessed for the coregenes only (Fig. 3A and Fig. S2), which circumvented thedependency of the proteome profiles on the underlying genecontent differences in the previous comparisons. These resultswere attributable to a high number of proteins expressed by oneor a few, but not all, of the strains possessing the correspondinggene, with proportions that varied from 1.9 to 2.6 times morethan those proteins expressed by all strains possessing thecorresponding gene (Table S6). For instance, although 20% ofthe core proteins (556 genes) were expressed by all strains, asubstantially larger fraction of core proteins (36%, or 993 genes)were expressed by one or more (but not all) strains. Althoughsome of these differences may be caused by higher noise in theproteomics data relative to the genomics data, we believe thatmany of these differences are biologically relevant because of thehigh reproducibility (�80%) of proteomics measurements on

15912 � www.pnas.org�cgi�doi�10.1073�pnas.0902000106 Konstantinidis et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1

Page 5: Comparative systems biology across an evolutionary

batch cultures such as those used in the present study (15), ourhigh stringency in processing and analyzing the proteomics data(see Materials and Methods), and the fact that very similar resultswere found when a subset of five specific regions of traditionaltwo-dimension protein gels were overlaid and compared forabsence or presence of protein spots (Fig. S3). Finally, proteinscharacteristic of the stationary growth phase, such as the RpoSsigma factor (16), were not detected in the expressed proteomes,suggesting that all of our cultures were sampled at their expo-nential growth phase.

Our findings revealed that although strains CN-32 and W3–18-1 are significantly more closely related than are strains MR-4and MR-7—e.g., a 2% higher ANI value translates to substan-tially higher gene content and evolutionary relatedness, as weand others have shown (7)—the former strains showed compa-rable differences in expressed proteins compared with the latterstrains for the same genes analyzed (Fig. 3). These findings couldtherefore be attributable to a higher degree of environmental/ecological adaptations (which may have altered metabolic andregulatory networks) in the CN-32/W3–18-1 pair relative to theMR-4/MR-7 pair. Similarly, S. denitrificans OS217, which ap-peared to be the most ecologically specialized organism of theset, also showed the most unique proteomic profile (Fig. 3). Thelarger gene expression differences observed for OS217 andCN-32/W3–18-1 than anticipated based on their evolutionarydivergence alone echoes the results described above based on thegene content analysis. Furthermore, the largest fraction (44%)of the proteins detected in the protein profiles was strain specificand included many nonhypothetical proteins such as outermembrane proteins, TonB-dependent receptors, proteases, re-striction-modification enzymes, glycosylases, and polysaccharidebiosynthesis enzymes. Most of these proteins can be linked tometabolic fitness or interaction with the environment, and hencecould possibly underlie important physiological and/or regula-tory differences among the strains. The extensive variability incore proteins and the high number of strain-specific proteinsexpressed under identical growth conditions indicates a multi-faceted and highly dynamic control of whole genome expression.Collectively, our proteomics analyses suggest that changing thiscontrol appears to represent a particularly important mecha-nism, in addition to gene acquisition or loss, for rapid adaptationin changing and diverse environments. Consistent with theseconclusions, the first mutations observed in experimentallyevolved E. coli strains for 20,000 generations under laboratoryconditions involved regulatory genes and networks (17).

Compartmentalized Microbial Evolution. To characterize which cel-lular functions evolve faster in the Shewanellae, the percentconservation of selected functional gene categories (see Mate-rials and Methods for details) was evaluated against the evolu-tionary relatedness among the strains compared (measured bypercent ANI). As evolutionary distance increased, the percentconservation of all categories decreased, but the extent ofdecline (i.e., the slope) differed, presumably reflecting the variedselection pressures on the corresponding genes. The analysisrevealed the following order: pathways were substantially moreconserved than individual orthologs; orthologs more conservedthan transcriptional regulators, sensing and respiration genes,and expressed proteins (Fig. 4). The most rapidly changingindividual functions, both in terms of gene presence/absence andsequence conservation, were TonB-dependent outer membranereceptors followed by methyl-accepting chemotaxis proteins,transcription regulators and cytochromes. These results areconsistent with our previous findings and suggest that genomicand regulatory changes in sensing mechanisms represent the firstline of adaptive response to different redox conditions. Exper-imentally determined anaerobic growth characteristics such asbiomass produced and electron acceptors used (Table S3) were

also very different among the Shewanella strains and rankedamong the fastest changing functional entities (Fig. 4). A growthphenotype encompasses the sensing of a substrate, expression ofrelevant regulators, transporters, and enzymes, in addition tophysiological parameters related to the change in growth con-ditions. These potential sources for additional variation amongthe strains may explain why the growth phenotype is significantlyless conserved compared with pathways, orthologs, and proteinexpression patterns.

Summary and Perspectives for the Future. Microbiologists havebeen primarily focused on comparisons among either veryclosely related strains of the same species or distantly relatedspecies to advance understanding of the microbial life on Earth.The 10 Shewanella genomes studied here were selected torepresent a range of evolutionary distances, providing for a moreunconstrained view of microbial diversity and evolution. Com-parisons among these genomes revealed that the Shewanellagenus is genomically and, more so proteomically, diverse. Al-though a high degree of variation in protein expression profileswas anticipated among distantly related species, the variationobserved among strains of the same species was comparativelymuch larger than expected, given also the single growth condi-tion used (Figs. 3 and 4). It also appears that, in some cases, thevariation in expressed proteomes correlated positively with theextent of environmental adaptation (specialization). These find-ings have important implications for the correspondence be-tween genotype and phenotype and, hence, for the bacterialspecies concept. The evolutionary and functional gradientsreported here also suggested that specialization might occur overa very short time span, much shorter compared with thatcorresponding to the current species standards. Specializationappeared to take place primarily through changes at the regu-

Fig. 4. Modeling bacterial genotypic and phenotypic conservation across anevolutionary gradient. The presence of orthologous proteins, TonB outermembrane receptors, cytochromes, methyl-accepting chemotaxis proteins(MCPs), transcriptional regulators, metabolic pathways, protein expressionpatterns, and reduction of metal or metalloids (anaerobic growth) was de-termined for the 10 Shewanella strains (see Materials and Methods). Each ofthe traits was compared among the Shewanella strains in a pairwise manner(45 comparisons in total). The fraction of shared traits was determined foreach pair of strains and plotted against the average nucleotide identity (ANI)of the respective strain pair. Inset graph depicts the relationships betweenconservation of the traits and evolutionary distance using linear regressiontrendlines adjusted to intersect with the x and y axis at 100%. The r2 values ofthe regressions are also shown.

Konstantinidis et al. PNAS � September 15, 2009 � vol. 106 � no. 37 � 15913

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1

Page 6: Comparative systems biology across an evolutionary

latory level and through the high plasticity and fluidity of theShewanella chromosomes (Fig. 4).

The power of ‘‘omics’’ compared with traditional approachesto unravel an organism’s environmental and ecological adapta-tions and make robust predictions about the similarity (ordifference) in phenotypic traits among organisms was alsohighlighted by our analyses. The published literature, as well asour experimentally derived physiological and growth data, couldnot easily distinguish between most of the strains used in thisstudy or even define general properties for the major cladesrepresented by these strains. This was also reflected in the verylow correlation obtained between anaerobic growth character-istics (Table S3) and the evolutionary relatedness of the strainscompared. In contrast, genomic and proteomic data correlatedwell with the phylogeny of the strains and identified congruentlystrain-specific adaptations that might be linked to speciation forseveral of the strains studied. These results further corroboratethe notion that it is time to start replacing the traditionalapproaches for defining diagnostic phenotypes for new species orclades with omics-based procedures.

Distinguishing the effect of ecological adaptation from that ofevolutionary divergence alone represents the most limitingfactor in increasing the power of our predictions on phenotypebased on the genotype. Toward this direction, studying the extentof variation among members of the same natural population, i.e.,among organisms with very similar environmental adaptations,and contrasting it to the levels of variation detected in this studywith diverse organisms will allow fruitful conclusions. Thetrendlines obtained in this study (Fig. 1C and Fig. 4) also providea reference for comparing organisms of narrower (or broader)metabolic versatility than the Shewanellae. Furthermore, al-though the growth conditions used in this study were verylimited, the conditions remain artificial compared with theenvironmentally relevant conditions and hence may representdifferent stresses for each isolate evaluated. Replicate experi-ments and experiments performed with continuous cultures(chemostats) are currently underway to provide further quanti-tative insights into the role of variation in gene expression.Finally, a major limitation remains in that, despite the dedicatedefforts of numerous laboratories, many of the genes in thegenome have not been experimentally characterized, and theirphysiological roles are unknown. Continuing the efforts toestablish the functions of as many genes in the genome aspossible is critical for a thorough understanding of a bacteriumthat could serve as a model for versatile environmental bacteria.

Regardless of these limitations, the results presented here

constitute important information toward better modeling thecorrespondence between genotype and phenotype, and providedirections and testable hypotheses that will bring us one stepcloser to systems-level understanding of microbial species andpopulations.

Materials and MethodsThe organisms used in this study, their genomic features, gene content, andaccession numbers of the versions of the genomic sequences used in the study areprovided in Table S1. Orthologs were identified for the 10 Shewanella genomesby a combination of three methods: (i) protein–protein pairwise reciprocal BLAST(blastp) (18); (ii) reciprocal protein-genomic sequence best match (tblastn); and(iii) Darwin pairwise best hit (19). Genes found in plasmids or mobile elementswere excluded from ortholog and proteome comparisons among the strains. Thedegree of conservation of cellular functions or traits between two strains (Fig. 4)was determined as follows. (i) For orthologs, transcriptional regulators, TonBreceptors, MCPs, and cytochromes: all genes in the genome assignable to each ofthese categories were determined based on the gene annotation and the num-ber of orthologous genes shared between two strains for each category (accord-ing to Table S2) was divided by the total number of genes assignable to thecategoryforeachstrain.Thetwovalueswereaveragedtoprovidethevaluesusedin Fig. 4. (ii) A total of 163 unique pathways were identified in the 10 Shewanellagenomes according to the BioCyc pathway schema. The number of shared path-ways between the strains, as a fraction of the total pathways carried by a strain,was determined based on the presence/absence of the corresponding pathwaygenes. (iii) For proteomes and anaerobic growth, the number of orthologousproteins expressed (Table S6) and metal/metalloids respired (Table S3) by bothstrains in a pair were divided by the total number of (nonredundant) proteinsexpressed and metal/metalloids respired by either strain, respectively. The use of‘‘total traits counted for both strains’’ as the denominator (as opposed to ‘‘countsfor one strain’’) provided also for more direct comparisons with the sequence-based traits (i and ii, above) because otherwise the latter traits would have beenpenalized relatively higher because of the high number of ‘‘auxiliary’’ genes,which remained unexpressed under the simple growth conditions tested. Forproteomicsanalysis, culturesweregrownaerobically inTrypticSoyBrothtoafinaloptical density (OD) of 0.5. Cells were lysed, proteins extracted and digested withtrypsin, and the resulting peptides analyzed by mass spectrometry as previouslydescribed (20), with the only exception that filtering of the data was performedas described in ref. 21. Two-dimensional proteomic gels were carried out asdescribed previously (15). A detailed description of materials and methods isincluded in the SI Materials and Methods.

ACKNOWLEDGMENTS. The authors thank the numerous members of theShewanella Federation for useful discussions during the course of theirgenomic investigations of Shewanella. The authors were supported by theU.S. Department of Energy through the Shewanella Federation consortiumand the Proteomics Application project. The Michigan State University workrelevant to speciation was also supported by the National Science Foundation(DEB 0516252). Portions of this research were performed in the EnvironmentalMolecular Sciences Laboratory, a U.S. Department of Energy national scientificuser facility located at the Pacific Northwest National Laboratory in Richland,Washington.

1. Stackebrandt E, et al. (2002) Report of the ad hoc committee for the re-evaluation ofthe species definition in bacteriology. Int J Syst Evol Microbiol 52:1043–1047.

2. Brenner D, Staley J, Krieg N (2001) Bergey’s Manual of Systematic Bacteriology(Springer-Verlag, New York), 2nd Ed, pp 27–31.

3. Vandamme P, et al. (1996) Polyphasic taxonomy, a consensus approach to bacterialsystematics. Microbiol Rev 60:407–438.

4. Fredrickson JK, et al. (2008) Towards environmental systems biology of Shewanella.Nat Rev Microbiol 6:592–603.

5. Kettler GC, et al. (2007) Patterns and implications of gene gain and loss in the evolutionof Prochlorococcus. PLoS Genet 3:e231.

6. Goris J, et al. (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91.

7. Konstantinidis KT, Ramette A, Tiedje JM (2006) The bacterial species definition in thegenomic era. Philos Trans R Soc Lond B Biol Sci 361:1929–1940.

8. Tiedje JM (1988) Ecology of Denitrification and Dissimilatory Nitrate Reduction toAmmonium (Wiley, New York) pp 179–244.

9. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589–594.

10. Caspi R, et al. (2008) The MetaCyc Database of metabolic pathways and enzymesand the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res36:D623–D631.

11. Driscoll ME, et al. (2007) Identification of diverse carbon utilization pathways inShewanella oneidensis MR-1 via expression profiling. Genome Inform 18:287–298.

12. Hau HH, Gralnick JA (2007) Ecology and biotechnology of the genus Shewanella. AnnuRev Microbiol 61:237–258.

13. Enard W, et al. (2002) Intra- and interspecific variation in primate gene expressionpatterns. Science 296:340–343.

14. Oleksiak MF, Churchill GA, Crawford DL (2002) Variation in gene expression within andamong natural populations. Nat Genet 32:261–266.

15. Elias DA, et al. (2008) The influence of cultivation methods on Shewanella oneidensisphysiology and proteome expression. Arch Microbiol 189:313–324.

16. Lange R, Hengge-Aronis R (1991) Identification of a central regulator of stationary-phase gene expression in Escherichia coli. Mol Microbiol 5:49–59.

17. Philippe N, Crozat E, Lenski RE, Schneider D (2007) Evolution of global regulatorynetworks during a long-term experiment with Escherichia coli. Bioessays 29:846–860.

18. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of proteindatabase search programs. Nucleic Acids Res 25:3389–3402.

19. Gonnet GH, Hallett MT, Korostensky C, Bernardin L (2000) Darwin v. 2.0: An interpretedcomputer language for the biosciences. Bioinformatics 16:101–103.

20. Fang R, et al. (2006) Differential label-free quantitative proteomic analysis of Sh-ewanella oneidensis cultured under aerobic and suboxic conditions by accurate massand time tag approach. Mol Cell Proteomics 5:714–725.

21. Washburn MP, Wolters D, Yates JR, 3rd (2001) Large-scale analysis of the yeast proteomeby multidimensional protein identification technology. Nat Biotechnol 19:242–247.

22. Bruen TC, Philippe H, Bryant D (2006) A simple and robust statistical test for detectingthe presence of recombination. Genetics 172:2665–2681.

15914 � www.pnas.org�cgi�doi�10.1073�pnas.0902000106 Konstantinidis et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 30

, 202

1