86
Biological Sciences/Evolution Genome Evolution in Cyanobacteria: the Stable Core and the Variable Shell Tuo Shi* and Paul G. Falkowski* § Author contribution: T.S. and P.G.F. designed research; T.S. performed research; T.S. and P.G.F. analyzed data; and T.S. and P.G.F. wrote the paper. The authors declare no conflict of interest. Abbreviations: LGT, lateral gene transfer; NJ, neighbor joining; ML, maximum likelihood; MP, maximum parsimony; PCoA, principal coordinates analysis; Present address: University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064 § To whom correspondence should be addressed. Paul G. Falkowski, 71 Dudley Road, New Brunswick, NJ 08901. E-mail: [email protected], Telephone: (732)-932-6555 ext.370, Fax: (732)-932-4083.

Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Biological Sciences/Evolution

Genome Evolution in Cyanobacteria: the Stable Core and the Variable Shell

Tuo Shi*† and Paul G. Falkowski*‡§

*Environmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal

Sciences, and ‡Department of Geological Sciences, Rutgers, The State University of New Jersey,

New Brunswick, NJ 08901, USA

26 text pages, 7 figures and 3 tables

188 words in the abstract and total 36,149 characters in paper

‡Author contribution: T.S. and P.G.F. designed research; T.S. performed research; T.S. and

P.G.F. analyzed data; and T.S. and P.G.F. wrote the paper.

The authors declare no conflict of interest.

Abbreviations: LGT, lateral gene transfer; NJ, neighbor joining; ML, maximum likelihood; MP,

maximum parsimony; PCoA, principal coordinates analysis;

†Present address: University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064

§To whom correspondence should be addressed. Paul G. Falkowski, 71 Dudley Road, New

Brunswick, NJ 08901. E-mail: [email protected], Telephone: (732)-932-6555 ext.370,

Fax: (732)-932-4083.

Page 2: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 1

Abstract

Cyanobacteria are the only known prokaryotes that perform oxygenic

photosynthesis, the evolution of which transformed the biology and geochemistry of

Earth. The rapid increase in published genomic sequences of cyanobacteria provides the

first opportunity to reconstruct events in the evolution of oxygenic photosynthesis on the

scale of entire genomes. Here we demonstrate the overall phylogenetic incongruence

among 682 orthologous protein families from 13 genomes of cyanobacteria. However,

principle coordinate analyses on the evolutionary relationships reveal a core set of 323

genes with similar evolutionary trajectories. The core set is extremely conservative in

protein variability, and is made largely of genes encoding the major components in

photosynthetic and ribosomal apparatus. Many of the key proteins are encoded by

genome-wide conserved small gene clusters which are indicative of protein-protein,

protein-prosthetic group and protein-lipid interactions. We propose that the

macromolecular interactions in complex protein structures and metabolic pathways retard

the tempo of evolution of the core genes, and hence exert a selection pressure that

restricts piecemeal lateral gene transfer of components of the core. Identification of the

core establishes a foundation for reconstructing robust organismal phylogeny in genome

space.

Keywords: cyanobacteria | gene family | lateral gene transfer | genome evolution |

oxygenic photosynthesis

Genome Evolution in Cyanobacteria

Page 3: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 2

Introduction

Cyanobacteria are the only oxygenic photosynthetic prokaryotes. Approximately

2.4 billion years ago, the evolution of that energy transduction pathway transformed

Earth’s atmosphere and upper ocean, ultimately facilitating the development of complex

life forms that are dependent on aerobic metabolism (1-3). Molecular fossils of

cyanobacteria date back at least 3 Ga (4-8), and the clade has evolved into one of the

largest and most diverse groups of bacteria on this planet today (9). Cyanobacteria

contribute significantly to global primary production (10, 11), and diazotrophic taxa are

central to global nitrogen cycle (12-14). Arguably, this group, more than any other

prokaryotic phylum, has had a greater impact on the biogeochemistry and evolutionary

trajectory on Earth, yet its own evolutionary history is poorly understood.

The availability of complete sequences of genomes for clusters of related

organisms provides the first opportunity to reconstruct events of genomic evolution

through the analysis of entire functional classes (15). Currently, cyanobacteria represent

one of the densest clusters of fully sequenced genomes (Table 1). Comparisons of

genome sequences of closely related marine Prochlorococcus and Synechococcus species

have demonstrated an intimate link between genome divergence in specific strains and

their physiological adaptations to different oceanic niches (16, 17). This ecotypic

flexibility appears to be driven by myriad selective pressures that govern genome size,

GC content, gene gain and loss, and rate of evolution (18, 19). Moreover, phylogenetic

analyses of genes shared by all the five known phyla of photosynthetic bacteria, including

cyanobacteria (the only oxygenic group), purple bacteria (Proteobacteria), green sulfur

bacteria (Chlorobi), green filamentous bacteria (Chloroflexi), and the gram-positive

Genome Evolution in Cyanobacteria

Page 4: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 3

heliobacteria (Firmicutes), have provided important insights into the origin and evolution

of photosynthesis, an intensively debated subject in the past decades (20-31). This

information has been substantially extended by genome-wide comparative informatics

(32-34). One of the major implications of the latter work is a significant extent of lateral

gene transfer (LGT) among these photosynthetic bacteria. The observation that

cyanophages sometimes carry photosynthesis genes (35-38) provides one mechanism of

rapid LGT among these phyla. However, LGTs almost certainly do not occur with equal

probability for all genes. For example, informational genes (those involved in

transcription, translation, and related processes), which are thought to have more

macromolecular interactions than operational genes (those involved in housekeeping), are

seldom transferred (39, 40). The existence of a core of genes resistant to LGT has been

reported in recent studies using relatively intensive taxon sampling (41, 42), suggesting

that there exists a set of genes that remain closely associated, resistant to LGT, and with

an extremely slow evolutionary tempo. Identification of such core genes potentially

allows separatation of true phylogenetic signals from “noise”. It is, therefore, of

considerable interest to transcribe all coherent genome data into pertinent phylogenetic

information (43) and to identify which genes are more susceptible to lateral transfer.

Here we report on identification and reconstruction of the phylogeny of 682

orthologous protein families from 13 genomes of cyanobacteria. Our primary goals are

twofold: a) to examine the impact of LGT on the evolution of photosynthesis and the

radiation of cyanobacterial lineages; and b) to identify a core set of genes that are

resistant to LGT on which robust organismal phylogy can be reconstructed. Our results

reveal that >52% (359) of the orthologs are susceptible to LGT and hence are responsible

Genome Evolution in Cyanobacteria

Page 5: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 4

for the inconsistant phylogenetic signal of this taxon in genome space. In contrast, the

remaining 323 othologs show broad phylogenetic agreement. This core set is comprised

of key photosynthetic genes as well as those coding for ribosomal translational apparatus.

This observation suggests that the macromolecular interactions in complex protein

structures (e.g., ribosomal proteins) and metabolic pathways (e.g., oxygenic

photosynthesis) are strongly resistant to piecemeal lateral gene transfer. Transfer was

ultimately accomplished by wholesale incorporation of cyanobacteria into eukaryotic

host cells, giving rise to primary photosynthetic endosymbionts which retained both

photosynthetic genes and genes encoding for its own ribosome (44-46).

Genome Evolution in Cyanobacteria

Page 6: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 5

Results

Conserved protein families in genomes of cyanobacteria

Our pair-wise genome comparison revealed a total of 682 orthologous protein-

coding genes that are common to all 13 genomes examined [supporting information (SI)

Table 2]. These genes constitute the core gene set and some of these define aspects of

the genotype that are uniquely cyanobacterial. This core set represents only 9.3% (in the

case of the largest genome Nostoc punctiforme) to 39.8% (in the case of the smallest

genome Prochlorococcus marinus MED4) of the total number of protein-coding genes

from each genome under study (see Table 1), but seems to account for all of the principal

functions in genome replication, expression, and repair, as well as the majority of the

reactions in several central metabolic pathways (SI Table 3). Our analysis is based on

identification of reciprocal best hits with a moderately customized BLAST e-value

threshold (10e−4) compared to what has been used by others (18, 19). This leads to an

estimate of the pool of orthologs similar to what has been identified from 10

cyanobacterial genomes (47), but nearly three times more than the number of

cyanobacterial signature genes bioinformatically characterized by Martin et al. (48), and

only 65% of the number of cyanobacterial clusters of orthologous groups (34). The

discrepancy mostly results, in the case of the former, from a filtering procedure to remove

homologs to chloroplasts and anoxygenic photosynthetic bacterial genomes, and in the

case of the latter, from a less stringent, unidirectional BLAST hit scheme employed. In

addition, some of the incomplete genomes used in this study are still undergoing

confirmation from the final assembly, and it is possible that equivalent genes have been

overlooked in some cases, resulting in an incomplete signature set. It is highly possible

Genome Evolution in Cyanobacteria

Paul Falkowski, 09/11/07,
Not clear what this sentence means – The e values used by others were higher or lower?
TS, 09/11/07,
Dufresne et al used the default e-value of 10, whereas Hess used e-10, for a four- genome comparison. Ours is in the middle. So it’s moderately customized.
Page 7: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 6

that both reciprocal and unidirectional, circular best BLAST hit schemes result in an

elimination of a large set of homolog families that are considered orthologs because of

the overly restrictive criterion (49). However, the reciprocal best hit criterion detects

more orthologs while producing the same qualitative results as the single best hit

approach (47). Indeed, even without the use of any particular threshold (i.e., the default

BLAST e-value threshold is 10), the set of orthologs identified via the reciprocal top

BLAST hit scheme probably underestimates the actual number of orthologs (19).

Phylogenetic incongruence among conserved protein families

Based on amino acid sequences, we built phylogenetic trees for each of the 682

orthologous protein families using both neighbor joining (NJ) and maximum likelihood

(ML) methods. To extrapolate major evolutionary trajectories, we analyzed and

compared the observed topologies from all 682 individual trees. Surprisingly, the

frequency distribution of observed topologies among the orthologous protein families

failed to reveal a predominant, unanimous topology that represents a large number of

orthologs (Fig. 1). In contrast, most of the orthologs (58% and 67% for NJ and ML,

respectively) select a particular topology by itself. As a result, the maximum number of

orthologs that share a particular topology is 15 and 13 for NJ and ML, respectively,

which accounts for only 1.9% to 2.1% of the orthologous datasets (Fig. 1). This result

clearly demonstrates the incongruence in phylogenetic information from individual

protein families, even though they are highly conserved among cyanobacterial taxa.

Phylogenomic reconcilement

To determine whether genes give completely incompatible phylogenetic

information or whether a common signal can be extracted from cyanobacterial

Genome Evolution in Cyanobacteria

Page 8: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 7

phylogenies, we employed a series of phylogenomic approaches, including the consensus

of 682 individual trees, supertree reconstruction, as well as the reconstruction of

phylogeny based on a concatenated alignment of 192,423 sites from the 682 sequences.

These approaches greatly resolved the topological incongruence, leading to five

topologies as shown in Fig. 2. Specifically, the NJ and ML trees built with different

models using the concatenated sequences gave three topologies in total (T1 and T2 for

NJ; T2 and T3 for ML), one of which is in agreement with that of the small subunit

ribosomal RNA (SSU rRNA) tree repeatedly obtained with NJ, ML or maximum

parsimony (MP) methods. The consensus- and super-tree built on the 682 individual NJ

trees showed two other topologies (T4 and T5), whereas both the consensus- and super-

tree of ML trees revealed an identical topology to one of the concatenated ML trees.

These five topologies are remarkably similar in that Synechocystis sp. PCC 6803 (SYN,

see Table 1 for abbr. for other species) and five diazotrophic species form a monophyletic

clade ((SYN,CWA),(TER,(NPU,(ANA,AVA)))), and that Synechococcus WH8102 (SYW)

and three Prochlorococcus species form another monophyletic clade in three different

formats: (SYW,(PMT,(PMS,PMM))), ((SYW,PMT),(PMS,PMM)), and (PMM,(PMS,

(PMT,SYW))). The notable conflicts concern the species Synechococcus elongates PCC

7942 (SCO) and the thermophilic Thermosynechococcus elongates BP-1 (TEL), which

tend to cluster at the base of the two major conserved subgroups but form aberrant

topologies.

However, analyses of the fitness of a particular topology to the 682 sequence

alignments (SI Figs 6 and 7) indicated that almost all (97.5% to 99.6%) of the datasets

support topologies T1-T5 at the 95% confidence level (P=0.95). This result suggests that

Genome Evolution in Cyanobacteria

Page 9: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 8

artificial paralogs or genes potentially obtained by LGT may thwart the establishment of

organismal relationships based on plural consensus of individual gene phylogenies.

The stable core and the variable shell in genome space

To characterize the evolutionary trajectories with regarding to remove artificial

paralogs, or genes potentially obtained by LGT, we calculated the tree distances among

all possible pairs of the orthologous sets. The pair-wise distances were then used to

conduct a principle coordinate analysis (PCoA). This resulted in a core set of 323 genes

that share similar evolutionary histories (i.e., co-evolving genes) as opposed to the other

359 that exhibit divergent phylogenies (i.e., independently evolving genes) (Fig. 3). It is

very striking that informational genes, especially those encoding ribosomal structural

components, are almost all grouped in the densest region, while the tail is formed largely

by operational genes. This observation is consistent with previous studies (39, 40) that

indicate phylogenetic information is highly conserved in informational genes.

Additionally, the core is comprised of key components constituting the scaffolds of the

photosynthetic apparatus, and proteins that participate in chlorophyll biosynthesis and the

Calvin cycle.

Using the sum of all possible pairwise genetic distance as a measure of the protein

variability (44), we compared the rates of evolution of genes in different functional

categories. A plot of frequency distribution of genes versus protein variability (Fig. 4)

reveals that ribosomal and photosynthetic genes are extremely conserved, whereas the

operational and other informational genes were strongly skewed toward higher protein

variability. This pattern of segregation is concordant with whether or not the genes are

located in the core, and whether or not they are organized in conserved gene clusters.

Genome Evolution in Cyanobacteria

Page 10: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 9

This result suggests that the core gene set appears to have remained relatively stable

throughout the entire evolutionary history of cyanobacteria, while genes in the shell are

more likely to be acquired via LGT and are hence poorly conserved.

Finally, we chose the 323 core genes to reconstruct the organismal phylogeny.

The resulting tree, based on the concatenated core gene set, had the same topology as that

obtained from the concatenation of all the 682 genes and that for the consensus and

supertree of the trees obtained from all protein families (T3 in Fig. 2 and tree presented in

Fig. 5). It differed only slightly from other tested topologies that are not rejected by many

individual alignments (Fig. 2). Interestingly, the genomes from all the diazotrophic

cyanobacteria cluster as one main group; this clade appears to have acquired its nitrogen

fixation ability from a unicellular, thermophilic non-diazotrophic ancestor, probably

during the Archaean Eon. The early diazotrophs almost certainly were non-heterocystous

and eventually developed that differentiated cell type probably as a result of elevated

level of atmospheric O2 (50).

Genome Evolution in Cyanobacteria

Page 11: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 10

Discussion

Our analyses reveal an overwhelming phylogenetic discordance among the set of

genes selected as likely orthologs (Fig. 1). Conflicting phylogenies can be a result of

either artifacts of phylogenetic reconstruction, LGT or unrecognized paralogy. In our

reciprocal best hit approach, we retained as orthologous gene families those containing

only one gene per species. Therefore, only orthologous replacement and hidden

paralogies (i.e., differential loss of the two copies in two lineages) can occur in selected

families. These two types of events are expected to be comparatively rare under strict

application of the reciprocal hit criterion (51). Thus, phylogenetic incongruence is

unlikely due to artifacts resulting from a biased selection of orthologous genes.

Furthermore, despite the overall disagreement in phylogenetic signal, most of the trees of

individual proteins gave unambiguously strong support for at least some particular sub-

topologies which are repeatedly obtained through consensus, supertree and even

concatenation analyses (Fig. 2). This finding obviates that phylogenetic incongruence is a

result of tree reconstruction artifacts or long branch attraction effects.

LGT is likely one of the most important driving forces that lead to the discrete

evolutionary histories of the conserved protein families. Indeed, LGT has played an

important role in the evolution of prokaryotic genomes (52-54). A hallmark of LGT is

that the transferred genes often exhibit phylogenic incongruity and/or aberrant organismal

distributions, which contrasts with the relationships inferred from both the SSU rRNA

tree and other most resolved phylogenies of vertically inherited individual protein-coding

genes. But how can this superficially random gene transfer event explain the conserved

Genome Evolution in Cyanobacteria

Page 12: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 11

nature of many of the key genes that comprise the functional core across all

cyanobacterial taxa?

While phylogenomic approaches are capable of capturing the consensus and the

most frequently appeared bipartitions in individual phylogenies, leading to highly

conserved subgroups or clusters that silhouette the major trend in genome evolution, they

may not necessarily guarantee the paucity of a conflicting phylogenetic signal in genome

space. The plural support for the consensus / supertree / concatenation topologies

indicates that the five top topologies are not significantly different from each other at

P=0.95; or in other words, all the five topologies would be accepted by more than 90% of

the dataset using the same criterion because the data do not discriminate among the

topologies. Do the consensus / supertree / concatenation trees accurately reflect

organismal history? Or, on the contrary, do they blur the vertical inheritance signal by

incorporating potential LGT? There is a large margin of uncertainty. Part of the

uncertainty may be due to the strength of the SH test, especially when examining the

accuracy of similar topologies. Indeed, the SH test is based on the evaluation of only 15

candidate topologies out of a total of 13,749,310,575 possible unrooted tree topologies

for 13 species. Although the majority of the 13,749,310,575 possible topologies would

not be supported by any dataset, the selection of a limited number of trees may have

biased the analyses. But part of the uncertainty can also be attributed to the data, most

notably the poorly conserved proteins that are disturbed by LGT. This is even more

pronounced in the PCoA, which demonstrates clearly that about 53% of the orthologs are

subject to frequent lateral transfers that might have complicated/diluted the vertical

inheritance signal (Fig. 3).

Genome Evolution in Cyanobacteria

Page 13: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 12

Our results reveal a core set of genes in cyanobacteria that share similar

evolutionary histories. The pattern of co-evolution of these genes is concomitant with

extraordinarily conserved protein primary structure (Figs. 4 and 5). This finding of

limited lateral transfer in highly conserved genes is consistent with the complexity

hypothesis; that is, genes coding for large complex systems that have more

macromolecular interactions are less subject to horizontal transfer than genes coding for

small assemblies of a few gene products (40). In addition to the ribosomal genes, a

significant number of photosynthetic genes are also clustered in the core, suggesting that

photosynthetic genes may be as constrained as the ribosomal genes through evolution. To

function effectively, the cyanobacterial photosynthetic apparatus needs an investment of

a huge number of proteins, pigments, cofactors, and trace elements. Moreover, genes

coding for the photosynthetic apparatus are scattered along the chromosome as small

clusters of two to four genes that are highly conserved among all cyanobacteria (55).

Although large-scale gene order has been shown to be relatively unstable in the

prokaryotic genomes (56-58), the conservation of a minimum number of essential

operons encoding proteins that function together in a pathway or structural complex

appears to prevent genes from being perturbed by LGT (55, 59). The complexity of

oxygenic photosynthetic machinery makes it difficult to transfer components piecemeal

to non-photosynsthetic prokaryotes. Indeed, operon splitting of the photosynthetic

apparatus requires many independent transfers of noncontiguous operons. Although

large-scale horizontal gene transfer among photosynthetic prokaryotes (32) may suggest a

complex non-linear process of evolution that results in a mosaic structure of

photosynthetic pathway (60), transfers of the key photosynthetic genes are very rare (36).

Genome Evolution in Cyanobacteria

Page 14: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 13

Transfer of this key pathway was only achieved by wholesale incorporation of

cyanobacteria into eukaryotic host cells (44-46).

Genome Evolution in Cyanobacteria

Page 15: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 14

Materials and Methods

Gene Family Selection

We performed all-against-all BLAST (61) comparisons of protein sequences for

all possible pairs of the 13 genomes of cyanobacteria (Table 1) using an e value of 10−4 as

a lower limit cutoff, and reciprocal genome-specific best hits were identified. We

considered genes as being probable orthologs when they were included in an assembled

set comprising the 13 genomes in which each gene was each other’s best BLAST match.

A total number of 682 protein families consisting of one gene per genome were retrieved

and assigned to functional categories according to those defined for the cluster of

orthologous groups (COGs) (62). Because of the lack of a particular category for

photosynthetic genes, the latter were assigned to the “energy production and conversion”

COG category. Other genes which fell into more than one of the 19 COG categories have

been assigned to a supplemental category called “miscellaneous” (19).

Alignments and Tree Construction

Protein sequences were aligned with ClustalW (63) with manual corrections,

followed by selecting unambiguous parts of the alignments and concatenating sequences

excluding all gap sites with Gblocks program (64). The maximum likelihood (ML) trees

were computed for each family with the PHYML program (65) using the Jones, Taylor,

and Thornton (JTT) model of substitution (66) and the γ-based method for correcting the

rate heterogeneity among sites. For each family, the neighbor joining (NJ) trees were also

constructed using the distance matrix provided by TREE-PUZZLE program version 5.1

(67) under a γ-based model of substitution (alpha parameter estimated by PUZZLE, eight

Genome Evolution in Cyanobacteria

Page 16: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 15

γ rate categories) and bootstrapped using SEQBOOT and CONSENSE from the PHYLIP

package version 3.6 (68).

Concatenation, consensus and supertree

To generate a set of reasonable candidate topologies that could be tested against

the alignments for individual gene, we used the consensus of the 682 trees from

individual protein families (two methods, yielding topologies T3 and T4 in Fig. 2), the

concatenation of all the proteins (192,423 amino acids) (seven methods, yielding

topologies T1 to T3), the supertree (two methods, yielding topologies T3 and T5), and the

SSU rRNA (three methods, yielding topology T1). In the case of the reconstruction of the

trees based on the SSU rRNA, we used the PHYLO_WIN (69) to perform the maximum

parsimony (MP), NJ with bootstrap, ML reconstruction using the γ-based method for

correcting the rate heterogeneity among sites. The consensus of the trees of the 682

protein alignments was obtained using the CONSENSE module of the PHYLIP package.

As there are no missing data, we also concatenated all the proteins and constructed the

tree either using the PROTML program in the MOLPHY package (70) or using the

quartet puzzling (QP) implemented with TREE-PUZZLE. Trees chosen for the supertree

computation were coded into a binary matrix using the coding scheme of Baum (71) and

Ragan (72). The matrices obtained were concatenated into a supermatrix. The supertree

was calculated on the supermatrix using program DNAPARS (default options) from the

PHYLIP package. Bootstrap values on the supermatrix were obtained using SEQBOOT

and CONSENSE.

Comparisons between Trees

Genome Evolution in Cyanobacteria

Page 17: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 16

Trees were compared with Treedist program using the branch score distance of

Kuhner and Felsenstein (73). On the nn distance matrix obtained (n is the number of

trees), a principle coordinates analysis (PCoA) is computed using SAS software. PCoA,

a multivariate ordination method based on distance matrices, allowed us to embed our n

trees in a space of up to n-1 dimensions. By taking the most significant two first

dimensions and plotting the objects (the trees) along these, the major trends and

groupings in the data can be determined by visual inspection.

For each of the 682 alignments, a comparison of the likelihood of the best

topology with that of the candidate topologies (SI Fig. 6) was performed with the

Shimodaira-Hasegawa (SH) test (74) implemented in TREE-PUZZLE. This test

determines whether these potential organismal phylogenies are significantly rejected by

the alignment and thus whether LGT or some other anomalous process must be invoked.

Genome Evolution in Cyanobacteria

Page 18: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 17

Acknowledgements

We thank William Martin and Colomban de Vargas for valuable comments and

stimulating discussions and Lin Jiang for help with the principle coordinates analysis.

This research is supported by NASA grants to P.G.F.. Sequence alignments and tree files

are available upon request.

Genome Evolution in Cyanobacteria

Page 19: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 18

References

1. Blankenship, R. E. & Hartman, H. (1998) Trends in Biochemical Sciences 23, 94-97.

2. Falkowski, P. G., Katz, M. E., Milligan, A. J., Fennel, K., Cramer, B. S., Aubry, M.

P., Berner, R. A., Novacek, M. J. & Zapol, W. M. (2005) Science 309, 2202-2204.

3. Raymond, J. & Segre, D. (2006) Science 311, 1764-1767.

4. Schopf, J. & Packer, B. (1987) Science 237, 70-73.

5. Buick, R. (1992) Science 255, 74-77.

6. Brocks, J. J., Logan, G. A., Buick, R. & Summons, R. E. (1999) Science 285, 1033-

1036.

7. Bekker, A., Holland, H. D., Wang, P. L., Rumble, D., Stein, H. J., Hannah, J. L.,

Coetzee, L. L. & Beukes, N. J. (2004) Nature 427, 117-120.

8. Knoll, A. H., Summons, R. E., Waldbauer, J. R. & Zumberge, J. E. (2007) in

Evolution of primary producers in the sea, eds. Falkowski, P. G. & Knoll, A. H.

(Academic Press, New York), pp. 134-163.

9. Whitton, B. A. & Potts, M. (2000) in The Ecology of Cyanobacteria, eds. Whitton, B.

A. & Potts, M. (Kluwer Academic Publishers, Dordrecht, The Netherlands), pp. 1-11.

10. Waterbury, J. B., Watson, S. W., Guillard, R. R. L. & Brand, L. E. (1979) Nature

277, 293-294.

11. Chisholm, S. W., Olson, R. J., Zettler, E. R., Goericke, R., Waterbury, J. B. &

Welschmeyer, N. A. (1988) Nature 334, 340-343.

12. Capone, D. G., Zehr, J. P., Paerl, H., Bergman, B. & Carpenter, E. J. (1997) Science

276, 1221-1229.

Genome Evolution in Cyanobacteria

Page 20: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 19

13. Zehr, J. P., Waterbury, J. B., Turner, P. J., Montoya, J. P., Omoregie, E., Steward, G.

F., Hansen, A. & Karl, D. M. (2001) Nature 412, 635-638.

14. Karl, D., Michaels, A., Bergman, B., Capone, D., Carpenter, E., Letelier, R.,

Lipschultz, F., Paerl, H., Sigman, D. & Stal, L. (2002) Biogeochemistry 57/58, 47-98.

15. Koonin, E. V., Aravind, L. & Kondrashov, A. S. (2000) Cell 101, 573-576.

16. Palenik, B., Brahamsha, B., Larimer, F. W., Land, M., Hauser, L., Chain, P.,

Lamerdin, J., Regala, W., Allen, E. E., McCarren, J., Paulsen, I., Dufresne, A.,

Partensky, F., Webb, E. A. & Waterbury, J. (2003) Nature 424, 1037-1042.

17. Rocap, G., Larimer, F. W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N. A.,

Arellano, A., Coleman, M., Hauser, L., Hess, W. R., Johnson, Z. I., Land, M.,

Lindell, D., Post, A. F., Regala, W., Shah, M., Shaw, S. L., Steglich, C., Sullivan, M.

B., Ting, C. S., Tolonen, A., Webb, E. A., Zinser, E. R. & Chisholm, S. W. (2003)

Nature 424, 1042-1047.

18. Hess, W. R. (2004) Current Opinion in Biotechnology 15, 191-198.

19. Dufresne, A., Garczarek, L. & Partensky, F. (2005) Genome Biology 6, R14.

20. Olson, J. M. & Pierson, B. K. (1987) Orig Life 17, 419 - 430.

21. Blankenship, R. E. (1992) Photosynth Res 33, 91 - 111.

22. Vermaas, W. F. J. (1994) Photosynth Res 41, 285-294.

23. Xiong, J., Inoue, K. & Bauer, C. E. (1998) Proc Natl Acad Sci USA 95, 14851 -

14856.

24. Gupta, R. S., Mukhtar, T. & Singh, B. (1999) Mol Microbiol 32, 893 - 906.

25. Xiong, J., Fischer, W. M., Inoue, K., Nakahara, M. & Bauer, C. E. (2000) Science

289, 1724-1730.

Genome Evolution in Cyanobacteria

Page 21: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 20

26. Baymann, F., Brugna, M., Muhlenhoff, U. & Nitschke, W. (2001) Biochimica et

Biophysica Acta (BBA) - Bioenergetics 1507, 291-310.

27. Gupta, R. S. (2003) Photosynth Res 76, 173-183.

28. Rutherford, A. W. & Faller, P. (2003) Phil. Trans R. Soc Lond. B. 358, 245-253.

29. Olson, J. M. & Blankenship, R. E. (2004) Photosynth Res 80, 373 - 386.

30. Sadekar, S., Raymond, J. & Blankenship, R. E. (2006) Mol Biol Evol 23, 2001 - 2007.

31. Xiong, J. (2006) Genome Biology 7, 245.

32. Raymond, J., Zhaxybayeva, O., Gogarten, J. P., Gerdes, S. Y. & Blankenship, R. E.

(2002) Science 298, 1616-1620.

33. Zhaxybayeva, O., Hamel, L., Raymond, J. & Gogarten, J. (2004) Genome Biology 5,

R20.

34. Mulkidjanian, A. Y., Koonin, E. V., Makarova, K. S., Mekhedov, S. L., Sorokin, A.,

Wolf, Y. I., Dufresne, A., Partensky, F., Burd, H., Kaznadzey, D., Haselkorn, R. &

Galperin, M. Y. (2006) PNAS 103, 13126-13131.

35. Bailey, S., Clokie, M. R. J., Millard, A. & Mann, N. H. (2004) Research in

Microbiology 155, 720-725.

36. Lindell, D., Sullivan, M. B., Johnson, Z. I., Tolonen, A. C., Rohwer, F. & Chisholm,

S. W. (2004) PNAS 101, 11013-11018.

37. Millard, A., Clokie, M. R. J., Shub, D. A. & Mann, N. H. (2004) PNAS 101, 11007-

11012.

38. Sullivan, M. B., Lindell, D., Lee, J. A., Thompson, L. R., Bielawski, J. P. &

Chisholm, S. W. (2006) PLoS Biology 4, e234.

39. Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. (1998) PNAS 95, 6239-6244.

Genome Evolution in Cyanobacteria

Page 22: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 21

40. Jain, R., Rivera, M. C. & Lake, J. A. (1999) PNAS 96, 3801-3806.

41. Daubin, V., Gouy, M. & Perriere, G. (2002) Genome Res. 12, 1080-1090.

42. Lerat, E., Daubin, V. & Moran, N. A. (2003) PLoS Biology 1, 101-109.

43. Eisen, J. A. & Fraser, C. M. (2003) Science 300, 1706-1707.

44. Rujan, T. & Martin, W. (2001) Trends in Genetics 17, 113-120.

45. Martin, W., Rujan, T., Richly, E., Hansen, A., Cornelsen, S., Lins, T., Leister, D.,

Stoebe, B., Hasegawa, M. & Penny, D. (2002) Proc. Natl. Acad. Sci. USA 99, 12246-

12251.

46. Falkowski, P. G. & Knoll, A. H. (2007) Evolution of primary producers in the sea

(Academic Press, New York).

47. Zhaxybayeva, O., Lapierre, P. & Gogarten, J. P. (2004) Trends in Genetics 20, 254-

260.

48. Martin, K., Siefert, J., Yerrapragada, S., Lu, Y., McNeill, T., Moreno, P., Weinstock,

G., Widger, W. & Fox, G. (2003) Photosynthesis Research 75, 211-221.

49. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. (1997) Science 278, 631-637.

50. Tomitani, A., Knoll, A. H., Cavanaugh, C. M. & Ohno, T. (2006) Prot. Natl. Acad.

Sci. USA 103, 5442-5447.

51. Zhaxybayeva, O. & Gogarten, J. P. (2003) BMC Genomics 4, 37.

52. Ochman, H., Lawrence, J. G. & Groisman, E. A. (2000) Nature 405, 299-304.

53. Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. (2002) Mol Biol Evol 19, 2226-

2238.

54. Boucher, Y., Douady, C. J., Papke, R. T., Walsh, D. A., Boudreau, M. E. R., Nesbo,

C. L., Case, R. J. & Doolittle, W. F. (2003) Annual Review of Genetics 37, 283-328.

Genome Evolution in Cyanobacteria

Page 23: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 22

55. Shi, T., Bibby, T. S., Jiang, L., Irwin, A. J. & Falkowski, P. G. (2005) Mol Biol Evol

22, 2179-2189.

56. Mushegian, A. R. & Koonin, E. V. (1996) Trends Genet 12, 289–290.

57. Siefert, J. L., Martin, K. A., Abdi, F., Widger, W. R. & Fox, G. E. (1997) J Mol Evol

45, 467–472.

58. Itoh, T., Takemoto, K., Mori, H. & Gojobori, T. (1999) Mol Biol Evol 16, 332–346.

59. Dandekar, T., Snel, B., Huynen, M. & Bork, P. (1998) Trends in Biochemical

Sciences 23, 324-328.

60. Blankenship, R. E. (2001) Trends in Plant Science 6, 4-6.

61. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. &

Lipman, D. J. (1997) Nucl. Acids. Res. 25, 3389-3402.

62. Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., Tatusova, T. A., Shankavaram, U. T.,

Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D. & Koonin, E. V. (2001)

Nucl. Acids Res. 29, 22-28.

63. Thompson, J., Higgins, D. & Gibson, T. (1994) Nucleic Acids Res. 22, 4673-4680.

64. Castresana, J. (2000) Mol. Biol. Evol. 17, 540-552.

65. Guindon, S. & Gascuel, O. (2003) Systematic Biology 52, 696-704.

66. Jones, D. T., Taylor, W. R. & Thornton, J. M. (1992) Computer Applications in the

Biosciences (CABIOS) 8, 275-282.

67. Strimmer, K. & von Haeseler, A. (1996) Mol Biol Evol 13, 964-969.

68. Felsenstein, J. (2002).

69. Galtier, N., Gouy, M. & Gautier, C. (1996) Comput. Applic. Biosci. 12, 543-548.

70. Adachi, J. & Hasegawa, M. (1996) Comput. Sci. Monogr. No. 28, 1-150.

Genome Evolution in Cyanobacteria

Page 24: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 23

71. Baum, B. R. (1992) Taxon 41, 3–10.

72. Ragan, M. A. (1992) Mol Phylogenet Evol 1, 53-58.

73. Kuhner, M. K. & Felsenstein, J. (1994) Mol Biol Evol 11, 459-468.

74. Shimodaira, H. & Hasegawa, M. (1999) Mol Biol Evol 16, 1114-1116.

Genome Evolution in Cyanobacteria

Page 25: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 24

Tables

Table 1 General features of the genomes of the 13 cyanobacteria used in this study

Genome Abbr. Size (mb)

(G+C) %

Protein coding genes

Database Reference

Anabaena (Nostoc) sp. PCC7120 ANA 6.41 42.5 5,366 CyanoBase Kaneko et al., 01Anabaena variabilis ATCC29413 AVA 7.10 41.4 5,754 JGI/DoECrocosphaera watsonii WH8501 CWA 6.30 37.1 6,751 JGI/DoEGloeobacter violaceus PCC7421 GVI 4.66 64.0 4,430 CyanoBase Nakamura et al., 03Nostoc punctiforme ATCC29133 NPU 9.50 41.4 7,281 JGI/DoE Meeks, 01Prochlorococcus marinus MED4 PMM 1.66 32.0 1,713 JGI/DoE Rocap et al., 03 Prochlorococcus marinus MIT9313 PMT 2.41 50.8 2,267 JGI/DoE Rocap et al., 03 Prochlorococcus marinus SS120 PMS 1.75 36.0 1,884 Roscoff Dufresne et al., 03Synechococcus elongatus PCC7942 SCO 2.70 55.0 2,876 JGI/DoESynechococcus sp. WH8102 SYW 2.43 59.5 2,526 JGI/DoE Palenik et al., 03Synechocystis sp. PCC6803 SYN 3.57 47.7 3,264 CyanoBase Kaneko et al., 96(Thermo) Synechococcus elongatus BP-1 TEL 2.59 54.0 2,475 CyanoBase Nakamura et al., 02Trichodesmium erythraeum IMS101 TER 6.50 34.2 5,266 JGI/DoE

Database website: CyanoBase (http://www.kazusa.or.jp/cyanobase/); JGI/DOE (http://www.jgi.doe.gov/); Roscoff (http://www.sb-roscoff.fr/)

Genome Evolution in Cyanobacteria

Page 26: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 25

Figure legends

Fig.1. The distribution of tree topologies among 682 sets of orthologs. Both neighbor

joining (NJ, black bars) and maximum likelihood (ML, red bars) tree topologies gave

similar distribution pattern. There is no unanimous support for a single topology; rather

most of the orthologs (58% and 67% for NJ and ML trees, respectively) appeared as

singletons which associate with unique topologies.

Fig. 2. Representative backbone tree topologies. Phylogenetic trees were constructed

using both SSU rRNA gene and orthologous proteins through phylogenomic approaches

(see Materials and Methods for details). The phylogenetic tree construction methods are

highlighted with colored horizontal bars and text. The conserved monophyletic subgroups

in each topology (T1-T5) are shaded. The top panel shows the proportion of orthologs

giving a particular tree topology (NJ, black bar; ML, red bar). Also shown are the

examples of proteins corresponding to that topology. The bottom panel indicates number

of datasets accepting (left) or rejecting (right) a particular topology in a Shimodaira-

Hasegawa (SH) (74) test of each of the backbone topologies along with other candidates

randomly obtained (SI Fig. 7). Note that the consensus topology (T3) is rejected by only

a few gene families. Species abbreviations as in Table 1.

Fig. 3. Principal coordinates analysis (PCoA) of trees compared with topological

distance. (A) Plot of the two first axes of the PCoA made from 628 ML trees. The other

54 genes are excluded as a result of axis demarcation. The same experiment with NJ trees

gave very similar results. The ellipse depicts 323 orthologs in the densest region (the

Genome Evolution in Cyanobacteria

Page 27: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 26

core) of the cloud that share a common phylogenetic signal, while trees present in the

marginal area (the shell) are much more likely to be perturbed by lateral transfers.

Photosynthetic genes are color coded based on their respective pathways. Also shown are

examples of conserved small clusters of ribosomal (red text) and photosynthetic (green

text) genes that are located in the core. (B) The PCoA plotted against the protein

variability. Photosynthetic genes are collectively designated as green dots. Note both the

ribosomal and key photosynthetic genes are extremely conservative in protein variability.

Fig. 4. Frequency distribution of the fraction of genes belonging to designated categories

within each 1.0 interval of protein variability. Protein variability was measured by taking

the total length of the corresponding tree obtained in the phylogenetic analysis as

measured by amino acid substitutions, divided by the number of sequences in the tree.

Fig. 5. The ML tree reconstructed based on the concatenation of the core 323 proteins.

The topology shown agrees with the consensus topology of the 682 orthologs (T3 of Fig.

2) and is supported by almost all individual datasets (Fig. 2 and SI Fig. 7).The divergence

of diazotrophic cyanobacteria is designated by the dashed line.

Genome Evolution in Cyanobacteria

Page 28: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Number of orthologs1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber o

f top

olog

ies

0

10

20

30

400

500

NJML

Page 27

Figures

Shi & Falkowski, Fig. 1

Genome Evolution in Cyanobacteria

Page 29: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Page 28

NJD; NJJ; NJK;QP ProtML; ML_consensus;ML_supertree

16S_NJ/ML/MP;NJ_γ

NJ_consensus NJ_supertree

SSU rRNA Concatenated proteins Consensus and supertree

ANAAVANPUTERCWASYNPMMPMSPMTSYWSCOTELGVI

ANAAVANPUTERCWASYNTELPMMPMSPMTSYWSCOGVI

ANAAVANPUTERCWASYNPMMPMSPMTSYWSCOTELGVI

ANAAVANPUTERCWASYNTELSCOSYWPMTPMSPMMGVI

ANAAVANPUTERCWASYNTELSYWPMTPMSPMMSCOGVI

T1 T2 T3 T4 T5

% o

rtho

logs

0.0

0.5

1.0

1.5

2.0

2.5NJML

ATP synthase alphasubunit AtpA;50S ribosomal proteinL21 Rpl21;RNA polymerase alphasubunit RpoA

Protoporphyrin IXMagnesium chelatasesubunit ChlH;Cell division p-rotein FtsW

Geranylgeranylhydrogenase ChlP;30S ribosomal proteinS4 Rps4;Prokaryotic DNAtopoisomerase TopA

Protoporphyrin IXMagnesium-chelatasesubunit ChlI;Glyceraldehyde3-phosphatedehydrogenase Gap2

Shi & Falkowski, Fig. 2

Genome Evolution in Cyanobacteria

Page 30: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

0

2

4

6

8

10

12

-1.0-0.5

0.00.5

1.0

-1.0

-0.5

0.0

0.5

Prot

ein

varia

bilit

y

Dimension 1

Dimension 2

OperationalInformationalRibosomalPhotosynthetic

Low

High

BDimension 1

-1.0 -0.5 0.0 0.5 1.0

Dim

ensi

on 2

-1.0

-0.5

0.0

0.5

1.0OperationalInformationalRibosomalATPaseCalvin cycleHeme & ChlorophyllPSIPSIICyt. b6fNADH dehydrogenaseYcf

rpl20/35; rpl27/21;rpl3/4/23; rpl2/rps19;rpl33/rps18; rpl12/10;rpl22/rps3/rpl16;rps17/rplN/24/5/rps8;rpl6/18/rpsE/rpl15;rps13/11/rpoA/rpl17;rps12/7/fusA/tufA/rps10;

psbEF; petBD; psaAB;atpIH; atpAC; atpBE;

Page 29

Shi & Falkowski, Fig. 3

Genome Evolution in Cyanobacteria

Page 31: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Protein variability0 2 4 6 8 10 12

Prop

ortio

n

0.00

0.05

0.10

0.15

0.20

0.25clusteredunclustered

Prop

ortio

n

0.00

0.05

0.10

0.15

0.20

0.25coreshell

Prop

ortio

n

0.00

0.10

0.20

0.30

0.40

0.50operationalinformationalribosomalphotosynthetic

Low High

Page 30

Shi & Falkowski, Fig. 4

Genome Evolution in Cyanobacteria

Page 32: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Anabaena sp. PCC7120 Anabaena variabilis Nostoc punctoforme

Trichodesmium erythraeum Crocosphaera watsonii

Synechocystis sp. PCC6803 Prochlorococcus marinus MED4

Prochlorococcus marinus SS120 Prochlorococcus marinus MIT9313

Synechococcus sp. WH8102 Synechococcus elongatus

Thermosynechococcus elongatus Gloeobacter violaceus

100

100

100

100

100

100100

100

100

100

100

0.1

Dia

zotr

ophi

c N

on-d

iazo

trop

hic

Page 31

Shi & Falkowski, Fig. 5

Genome Evolution in Cyanobacteria

Page 33: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2 legend Page 1

Online Supporting Information

Table 2. Ortholog table including 682 conserved protein families from 13 genomes of

cyanobacteria. Column headings indicate ortholog ID, gene name, encoded protein, and

cyanobacterial species used in this study. Species abbreviation: ANA, Anabaena sp. PCC 7120;

AVA, Anabaena variabilis ATCC29413; CWA, Crocosphaera watsonii WH8501; GVI,

Gloeobacter violaceus PCC 7421; NPU, Nostoc punctiforme; PMM, Prochlorococcus marinus

MED4; PMS, Prochlorococcus marinus SS120; PMT, Prochlorococcus marinus MIT9313;

SCO, Synechococcus elongatus PCC7942; SYW, Synechococcus sp. WH8102; SYN,

Synechocystis sp. PCC 6803; TEL, Thermosynechococcus elongatus BP-1; TER, Trichodesmium

erythraeum IMS101. For species whose genomes were in draft form when this work was

undertaken, preliminary protein coding sequences were based on contig assemblies as of their

respective release dates: NPU, April 9, 2001; SCO, September 22, 2003; TER, December 16,

2003; AVA, January 19, 2005; CWA, January 30, 2004. The letters and numbers in each column

refer to the gene designators used by the individual genome projects, which are the likely

ortholog detected by the analysis reported here. In some cases, e.g., Nostoc, the name consists of

a contig followed by the gene number on that contig. In some cases the designator, e.g., sll0698,

indicates a relative direction of transcription too. Genes are tabulated for each functional

category in the physical order in which they occur in Synechococcus sp. WH8102. This allows

the identification of genome-wide conserved gene clusters (putative operons) containing two or

more signature genes by simply scanning the table. Signature genes included in these putative

operons are tabulated in italic. Informational (purple), ribosomal (pink) and photosynthetic genes

(green) are color shaded. Protein variability derived from the phylogenetic tree for each gene was

Online Supporting Information

Page 34: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2 legend Page 2

indicated. The binary number designates whether protein families were found in the core (1, also

in bold) or in the shell (0) of the genome as in Fig. 3.

Online Supporting Information

Page 35: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 3

IDProt. Var.

Core / shell ANA GVI PMM PMT NPU SCO PMS SYN TEL TER SYW AVA CWA Gene Product

ort343 4.8597 1 alr2010 glr3232 0001 0001 424.5 136.2572 0001 slr0965 tll2349 97.8656 0001 232.3907 358.4593 dnaN DNA polymerase III beta subunitort539 3.5725 1 all3652 glr2113 0003 0003 430.35 136.2574 0003 sll1056 tlr1683 122.6536 0003 232.2534 358.4475 purL phosphoribosylformylglycinamidine synthetase IIort537 7.7917 0 all3651 glr2114 0004 0004 430.36 136.2575 0004 sll0757 tlr1684 119.4070 0004 232.2535 358.4479 purF Glutamine amidotransferase class-II:Phosphoribosyl transferaseort186 8.7381 0 all0443 gll1546 0006 0006 483.77 136.2839 0006 sll0837 tll2091 118.3826 0006 232.1788 361.5946 conserved hypothetical proteinort190 5.9564 0 alr4943 gll1464 0007 0007 485.23 136.2838 0007 sll1348 tlr0587 120.5163 0007 232.1191 362.6447 conserved hypothetical proteinort210 8.0168 0 all1760 gll1336 0009 0010 506.137 129.527 0009 sll0271 tll0787 118.3710 0010 232.4168 359.4742   NusB family proteinort364 3.5384 1 all1759 glr0251 0010 0011 506.138 129.528 0010 slr2102 tlr0928 118.3711 0011 232.4169 235.1403 ftsY signal recognition particle docking protein FtsYort087 5.0449 0 all1758 glr1672 0011 0012 506.142 129.529 0011 slr2031 tlr2236 118.3712 0012 232.4170 241.1494 conserved hypothetical proteinort017 3.9641 1 alr3887 gll1051 0012 0013 432.30 129.531 0012 slr1133 tll0366 118.3713 0013 232.786 241.1495 argH Fumarate lyase:Delta crystallinort392 4.3684 0 alr2445 gll4194 0016 0021 506.170 135.1882 0016 sll0057 tlr1314 111.1619 0023 232.4265 278.2040 grpE putative heat shock protein GrpEort341 3.4126 1 alr2447 glr4267 0017 0022 506.172 135.1884 0017 sll0897 tll0790 111.1622 0024 232.4267 231.1351 dnaJ DnaJ proteinort145 6.9095 0 alr2449 glr4268 0019 0024 506.174 135.1886 0019 sll0898 tll0987 111.1625 0026 232.4269 361.5601 conserved hypothetical proteinort149 5.2881 1 alr5067 glr3498 0020 0025 418.36 136.2277 0020 slr1847 tlr0723 120.4678 0027 232.1283 295.2370 conserved hypothetical proteinort452 7.0501 0 alr5066 glr2317 0021 0026 418.34 135.2117 0021 slr1424 tll0370 120.4677 0028 232.1282 361.5931 murB UDP-N-acetylenolpyruvoylglucosamine reductaseort453 8.0446 0 alr5065 glr2316 0022 0027 418.33 135.2116 0022 slr1423 tll0371 120.4676 0029 232.1281 361.5930 murC Probable UDP-N-acetylmuramate-alanine ligaseort369 2.7995 1 all5062 glr0530 0023 0028 418.18 135.1726 0023 sll1342 tll1466 120.4961 0030 232.1279 201.980 gap2 glyceraldehyde 3-phosphate dehydrogenase (NADP+)ort648 6.5608 0 alr5061 glr3298 0024 0029 418.32 129.567 0024 slr1787 tll2366 120.4675 0031 232.1278 211.1112 thiL putative thiamine monophosphate kinaseort347 2.2050 1 all5058 glr1285 0026 0031 418.21 129.576 0026 slr0434 tlr1294 112.1685 0033 232.1276 359.4728 efp Translation elongation factor EF-Port002 4.0014 1 all5057 glr1286 0027 0032 418.22 129.577 0027 slr0435 tlr1295 112.1688 0034 232.1275 359.4729 accB biotin carboxyl carrier protein (BCCP) subunit of acetyl-CoA carboxylaseort484 5.6054 0 alr4755 glr0970 0028 0033 505.73 132.998 0028 sll0660 tll2045 95.8488 0035 232.893 332.3322 pdxA putative pyridoxal phosphate biosynthetic protein PdxAort046 6.6183 0 all2847 gll1629 0036 0044 472.81 136.2428 0037 slr1538 tlr0767 114.2291 0046 232.4920 362.6013 cbiD cobalamin biosynthesis protein Dort394 2.0643 1 all2846 glr2615 0037 0045 472.82 136.2429 0038 slr0213 tll2418 114.2292 0047 232.4921 361.5637 guaA GMP synthase (glutamine-hydrolysing)ort647 2.9453 1 all3519 gll1807 1664 0052 505.205 136.2864 1824 slr0633 tll0065 122.6785 0054 232.2129 362.6674 thiG thiamin biosynthesis proteinort156 7.2478 0 alr3430 gll0286 1663 0054 498.153 128.510 1823 slr0171 tll2406 119.4203 0056 232.2353 329.3240 ycf37 conserved hypothetical proteinort585 2.5480 1 alr3428 gll2784 1662 0055 498.151 128.507 1822 sll0767 tll2158 119.4201 0057 232.2351 361.5828 rpl20 50S ribosomal protein L20ort596 4.1919 1 asr3427 gsl2785 1661 0056 498.150 128.506 1821 ssl1426 tsl2159 119.4200 0058 232.2350 361.5827 rpl35 50S ribosomal protein L35ort267 8.0685 0 all4933 glr2559 1659 0058 505.147 134.1557 1819 sll1377 tlr1854 121.5205 0060 232.1179 358.4612 putative glycosyltransferaseort344 5.2289 0 alr4932 glr1609 1658 0059 505.44 126.225 1818 sll1360 tlr1132 116.2672 0061 232.1178 321.3017 dnaX DNA polymerase, gamma and tau subunitsort308 2.1164 1 alr3684 glr4196 1657 0061 448.70 129.548 1817 sll0535 tlr0510 118.3957 0063 232.2502 319.2952 clpX Putative ATP-dependent protease ATP-binding subunit ClpXort224 7.6150 0 alr3681 glr4195 1655 0063 448.68 129.546 1815 sll0533 tlr0508 118.3955 0065 232.2504 122.331 Possible FKBP-type peptidyl-prolyl cis-trans isomeraseort026 4.3987 1 all3680 glr1017 1654 0064 448.8 135.2061 1814 slr0549 tlr0069 118.3578 0066 232.2505 336.3450 asd aspartate-semialdehyde dehydrogenaseort331 5.6049 1 all3679 glr1018 1653 0065 448.9 135.2062 1813 slr0550 tlr1219 118.3579 0067 232.2506 336.3451 dapA dihydrodipicolinate synthaseort170 3.5064 1 all3678 glr1019 1652 0066 448.10 135.2063 1812 slr0551 tlr1220 118.3580 0068 232.2507 336.3452 conserved hypothetical proteinort437 5.2913 1 alr3644 gll1774 1648 0073 482.108 134.1469 1808 slr0657 tlr1833 96.8564 0070 232.2540 284.2172 lysC aspartate kinaseort174 8.5078 0 alr3444 gll0066 1647 0074 505.39 136.2783 1807 sll0544 tlr1573 107.800 0073 232.2371 317.2914   conserved hypothetical proteinort232 10.3689 0 all0932 gll0401 1651 0068 501.71 134.1586 1811 slr1278 tlr1616 121.5631 0075 232.4394 267.1864 possible MesJ homologort177 4.3443 0 all3013 glr4006 1650 0069 439.16 134.1395 1810 slr2032 tlr1249 122.7130 0076 232.4766 361.5613 conserved hypothetical proteinort673 2.7635 1 all1132 gll1855 1649 0071 418.3 136.2222 1809 sll0459 tlr2033 122.7522 0077 232.3538 361.5490 uvrB excinuclease ABC subunit Bort460 4.4651 0 alr3638 gll1033 1645 0079 504.24 130.690 1805 sll1165 tlr2240 119.4444 0078 232.2097 341.3608 mutS putative DNA mismatch repair proteinort565 2.7893 1 alr3993 glr1042 1643 0081 509.108 130.693 1803 sll1282 tlr0067 103.289 0082 232.686 346.3807 ribH Putative 6,7-dimethyl-8-ribityllumazine synthase or riboflavin synthase

Online Supporting Information

Page 36: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 4

ort630 2.7558 1 alr4851 gll1836 1639 0083 427.41 136.2714 1801 sll0616 tll1851 122.6969 0086 232.1095 270.1905 secA preprotein translocase SecA subunitort327 3.4146 1 all4037 glr2514 1638 0117 479.50 125.197 1800 slr1348 tlr0851 99.8760 0090 232.652 354.4128 cysE serine acetyltransferaseort189 5.5755 1 all1076 glr4252 1637 0118 448.6 136.2611 1799 sll1961 tll2117 105.623 0091 232.2627 286.2198   conserved hypothetical proteinort420 2.6576 1 all4623 gll1756 1636 0119 483.54 136.2498 1798 slr0974 tlr1620 114.2168 0093 232.1009 362.6249 infC translation initiation factor IF-3ort445 6.7596 0 alr5266 gll3248 1635 0120 483.83 129.624 1797 sll0817 tlr0648 120.4905 0094 232.1471 361.5471 miaA tRNA delta-2-isopentenylpyrophosphate (IPP) transferaseort397 2.1798 1 all5265 gll2506 1634 0121 483.40 129.537 1796 sll2005 tll0647 120.4740 0095 232.1470 362.6726 gyrB DNA gyrase subunit Bort263 8.6248 0 alr0517 gll0502 1627 0136 454.52 136.2624 1788 slr1448 tll1428 121.6195 0117 232.1845 52.32 Putative fructokinaseort009 7.0173 0 alr1414 glr3183 1625 0138 476.44 127.357 1786 sll1234 tll2390 122.7559 0119 232.2846 360.5181 ahcY putative adenosylhomocysteinaseort641 2.6652 1 alr0088 glr2942 1623 0141 444.42 136.2723 1784 slr0925 tlr1490 101.106 0121 232.5313 351.3997 ssb single-stranded DNA-binding proteinort438 4.3220 0 all4071 gll3356 1618 0147 507.275 125.172 1779 slr1550 tll0212 115.2623 0127 232.2062 362.6074 lysS lysyl-tRNA synthetaseort638 3.5393 1 alr5070 gll1494 1616 0155 418.38 135.2092 1776 slr1639 tll1927 115.2649 0134 232.1286 361.5795 smpB tmRNA binding protein SmpBort627 4.3404 0 all2894 gll3119 1615 0156 431.9 132.1024 1775 sll0613 tll2253 112.1715 0135 232.4875 289.2256 ruvB Holliday junction DNA helicase RuvBort296 4.0367 1 all2102 glr2540 1613 0158 389.22 135.1997 1773 sll0100 tll0013 119.4412 0137 232.4927 361.5413 Zinc metallopeptidase M20/M25/M40 familyort645 1.2772 1 all0982 glr1118 1611 0160 504.76 134.1422 1771 slr0118 tlr0334 105.559 0140 232.4245 361.5939 thiC thiamin biosynthesis protein

ort652 3.4278 1 alr3344 glr2297 1610 1957 373.12 136.2842 1770 sll1070 tll1870 122.6452 0141 232.2543 361.5431cbbT, tktA transketolase

ort349 3.4950 1 alr3343 gll4014 1609 1956 373.11 136.2841 1769 sll1069 tll1871 122.6451 0142 232.2544 362.6015 fabF 3-oxoacyl-[acyl-carrier-protein] synthase IIort005 3.9182 1 asr3342 glr2311 1608 1955 373.9 136.2840 1768 ssl2084 tsl1872 122.6450 0143 232.2545 362.6014 acpP acyl carrier protein (ACP)ort518 0.3821 1 asr3463 gsl3287 1607 1954 507.20 136.2232 1767 ssl0563 tsl1013 122.7684 0144 232.2386 361.5974 psaC photosystem I iron-sulfur center subunit VII (PsaC)ort380 4.1998 1 alr3464 gll2215 1606 1953 507.21 136.2233 1766 sll0220 tll1237 122.7687 0145 232.2387 362.6625 glmS Glutamine--fructose-6-phosphate transaminase (isomerizing)ort566 6.0249 0 alr2812 gll1505 1605 1951 443.69 131.788 1764 slr0808 tlr1065 122.7168 0149 232.4959 301.2515 rimM possible 16S rRNA processing protein RimMort091 11.8245 0 alr2709 gll2564 1598 1942 493.69 132.1078 1756 slr1438 tll1673 120.4906 0160 232.3146 342.3654 conserved hypothetical proteinort093 7.1695 0 alr3454 gll0917 1597 1938 504.121 129.614 1753 slr1342 tll2430 111.1637 0162 232.2378 361.5829 conserved hypothetical proteinort411 3.8460 1 all1897 gll0410 1594 1685 400.25 135.1786 1750 sll1184 tll0365 116.2796 0171 232.2660 359.4859 ho1 Heme oxygenaseort294 3.9272 1 all4495 glr2394 0304 1917 463.56 130.671 0335 sll1143 tlr2459 104.450 0181 232.2265 207.1070   UvrD/REP helicaseort234 10.7885 0 all3989 gll1453 0303 1910 509.252 134.1487 0334 sll1253 tll0519 103.378 0188 232.690 358.4650   possible polyA polymeraseort526 3.3219 1 asr3848 gsr0859 0300 1899 493.54 134.1370 0331 smr0008 tsr1544 106.653 0201 232.5532 86.164 psbJ putative photosystem II reaction center J proteinort528 1.9807 0 asr3847 gsr0858 0299 1898 493.53 134.1369 0330 smr0007 tsr1543 106.652 0202 232.829 86.163 psbL photosystem II reaction center L protein (psII 5 Kd protein)ort524 2.1270 1 asr3846 gsr0857 0298 1897 493.52 134.1368 0329 smr0006 tsr1542 106.651 0203 232.830 86.162 psbF Cytochrome b559 beta chainort523 3.1238 1 asr3845 gsr0856 0297 1896 493.51 134.1367 0328 ssr3451 tsr1541 106.650 0204 232.831 86.161 psbE cytochrome b559 alpha chainort288 5.6687 0 alr3844 glr0855 0296 1895 493.50 134.1366 0327 slr2034 tll1695 106.649 0205 232.832 86.160 ycf48 similar to plant photosystem II stability/assembly factorort626 4.4346 1 alr3843 glr0854 0295 1894 493.49 134.1365 0326 slr2033 tll1696 106.648 0206 232.833 86.159 rub probable rubredoxinort468 1.9352 1 all3842 glr0748 0294 1893 493.109 134.1603 0325 slr1279 tlr1429 106.714 0207 232.834 174.729 ndhC NADH dehydrogenase I chain 3 (or A)ort474 4.2351 1 all3840 glr0750 0292 1891 493.111 134.1605 0323 slr1281 tlr1430 106.716 0209 232.836 174.731 ndhJ NADH dehydrogenase I chain Jort124 4.5468 1 alr2298 glr4164 0291 1890 481.60 136.2771 0322 sll0154 tlr1951 109.1102 0210 232.4018 296.2411 conserved hypothetical proteinort218 6.1874 1 alr0180 glr4114 0290 1889 495.86 136.2751 0321 sll1001 tll0328 109.1205 0211 232.1612 343.3660 possible ABC transporter, ATP-binding componentort216 10.6923 0 alr0181 glr4115 0289 1888 495.87 136.2752 0320 sll1002 tll2407 109.1206 0212 232.1613 343.3661 possible ABC transporterort298 3.0127 1 all0278 glr0870 0288 1887 480.38 131.792 0319 slr1777 tll1413 102.249 0213 232.2001 238.1440 chlD Protoporphyrin IX Magnesium chelatase subunit ChlDort358 8.3583 0 asr2953 glr2647 0287 1886 482.47 131.923 0318 slr1093 tll0178 118.3672 0216 232.4816 274.1977 folK possible 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine…ort335 6.2350 1 alr3012 glr1861 0284 1884 439.69 136.2493 0316 slr1666 tlr0710 93.8356 0220 232.4767 216.1168 degT putative pleiotropic regulatory proteinort135 5.5160 0 all0513 glr2108 0283 1883 397.10 135.1719 0315 sll1289 tlr1788 122.7432 0221 232.1841 361.5903 conserved hypothetical proteinort351 2.9287 1 all4391 glr4188 0282 1881 493.129 136.2459 0314 slr1051 tll1693 87.8161 0225 232.2190 186.837 fabI enoyl-[acyl-carrier-protein] reductaseort406 2.6944 1 all4390 gll0769 0281 1880 493.131 136.2460 0313 slr0500 tll0133 87.8165 0226 232.2191 282.2133 hisB imidazoleglycerol-phosphate dehydratase

Online Supporting Information

Page 37: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 5

ort139 7.0841 0 all5088 glr0532 0279 1872 475.25 128.515 0311 slr0402 tll1797 122.7783 0234 232.1328 349.3910 conserved hypothetical proteinort211 6.2244 0 all5089 glr0388 0278 1871 475.24 128.513 0310 slr1334 tlr1520 119.4338 0235 232.1327 322.3044 Phosphoglucomutase/phosphomannomutase family proteinort547 6.8194 0 alr5099 gll4276 0275 1868 475.89 136.2543 0307 slr0185 tlr1828 105.603 0238 232.1318 357.4370 pyrE Orotate phosphoribosyltransferaseort247 3.9718 1 all5094 glr1860 0273 1865 475.17 136.2563 0305 slr0338 tll1731 110.1335 0240 232.1323 289.2251 probable oxidoreductaseort527 2.5918 0 asl0885 gsr2807 0272 1863 509.247 136.2807 0304 sml0005 tsl0176 120.4838 0243 232.3352 361.5328 psbK photosystem II 4 Kda protein psbK precursorort644 3.8372 1 alr1885 glr0802 0271 1862 509.246 136.2806 0303 slr0713 tll0080 105.566 0244 232.2648 362.6159 tgt tRNA-guanine transglycosylaseort320 8.6386 0 alr0379 gll3727 0270 1861 509.116 136.2284 0302 slr0636 tll1337 105.600 0245 232.1757 256.1697 cobS Cobalamin-5-phosphate synthase CobSort142 2.6218 0 asl3981 gll0064 0268 1859 507.159 136.2285 0300 sml0011 tsr1779 120.4861 0247 232.701 321.2998 conserved hypothetical proteinort243 10.1699 0 all3113 gll3761 0267 1858 489.110 136.2804 0299 sll1284 tlr2038 122.6578 0248 232.2703 304.2597 probable esteraseort439 3.4945 1 all0985 glr3299 0264 1854 490.102 136.2491 0296 slr0348 tlr1041 121.5850 0252 232.1875 360.4968 lytB LytB protein homologort290 5.6424 0 alr3085 gll1734 0262 1852 397.16 135.1953 0294 sll2014 tlr1697 122.6657 0254 232.3711 359.4836 sugar fermentation stimulation proteinort143 7.2182 0 all0185 gll1471 0261 1851 492.138 135.1872 0293 slr0488 tll2350 122.7476 0255 232.1616 345.3723 conserved hypothetical proteinort388 2.8950 1 alr4806 glr4369 0258 1847 356.2 136.2375 0290 sll1931 tlr2127 122.6602 0259 232.1047 360.5077 glyA serine hydroxymethyltransferase (SHMT)ort063 5.8566 0 alr4808 glr2441 0257 1845 356.3 136.2377 0289 slr0427 tlr2129 122.6605 0261 232.1049 362.6743   CinA-like proteinort427 9.6463 0 all1416 glr3418 0255 1843 459.4 129.560 0287 sll1444 tlr1234 108.1038 0263 232.2848 216.1169 leuD 3-isopropylmalate dehydratase small subunitort144 10.3707 0 alr1278 gll3632 0254 1841 492.17 135.1701 0286 sll0350 tlr1075 121.5706 0265 232.1923 362.6423 conserved hypothetical proteinort529 4.9395 1 asr0847 gsl3001 0252 1838 501.138 136.2405 0284 smr0009 tsr1387 104.494 0268 232.3317 319.2958 psbN Photosystem II reaction centre N protein (psbN)ort525 3.1472 1 asl0846 gsr3002 0251 1837 501.108 136.2679 0283 ssl2598 tsl1386 104.396 0269 232.3316 305.2618 psbH Photosystem II 10 kDa phosphoprotein (PsbH)ort531 6.4828 0 all0844 gll0538 0250 1835 501.110 136.2681 0281 slr0922 tll1384 104.398 0271 232.3314 214.1136 pth putative peptidyl-tRNA hydrolase (PTH)ort146 5.8805 0 asl4395 gsr0107 0249 1834 493.128 136.2682 0280 sll1340 tsl2214 104.402 0272 232.2186 332.3321 conserved hypothetical proteinort147 8.2031 0 alr4394 gll2308 0248 1833 493.20 136.2629 0279 sll0832 tlr0651 87.8177 0273 232.2187 186.836   conserved hypothetical proteinort148 5.1328 0 alr4393 gll2309 0247 1832 493.19 136.2628 0278 sll1424 tll1784 87.8176 0274 232.2188 186.835   conserved hypothetical proteinort475 2.6719 1 alr4392 gll3285 0246 1831 493.18 136.2627 0277 sll1423 tll1650 87.8174 0275 232.2189 186.834 ntcA Global nitrogen regulatory protein, CRP family of transcriptional regulatorsort666 8.6804 0 alr0570 glr1718 0242 1826 509.177 130.771 0272 sll1980 tlr0355 122.6363 0280 232.4228 361.5842 txlA thioredoxin-like protein TxlAort414 4.1862 1 alr1073 gll3651 0238 1817 406.27 125.157 0268 sll1362 tlr2330 86.8139 0289 232.2666 360.4887 ileS isoleucyl-tRNA synthetaseort372 6.4063 1 asl3192 gsl3282 0235 1815 387.11 131.821 0265 slr0033 tsl1524 122.7025 0292 232.2776 253.1647 gatC glutamyl-tRNA (Gln) amidotransferase subunit Cort545 4.2890 0 all1681 gll2875 0233 1807 378.7 127.391 0262 slr1476 tll1558 117.3410 0296 232.5037 362.6596 pyrB aspartate carbamoyltransferaseort336 7.3614 0 all3111 glr1548 0229 1802 489.112 136.2717 0258 sll0250 tll2467 122.6581 0301 232.2701 192.895 dfp putative p-pantothenate cysteine ligase…ort530 6.1353 0 all3854 glr3691 0228 1800 502.159 136.2719 0257 sll0427 tll0444 120.4767 0303 232.820 359.4858 psbO photosystem II manganese-stabilizing polypeptideort326 13.9143 0 all5138 glr1084 0227 1799 458.5 136.2720 0256 slr1165 tll1044 122.7556 0304 232.1341 259.1746 cysD ATP-sulfurylaseort022 5.1566 0 all0797 glr3393 0224 1795 466.46 136.2674 0253 sll1747 tll0463 104.495 0308 232.4469 280.2085 aroC Chorismate synthaseort150 6.6164 0 all2705 glr4150 0075 1608 388.13 126.256 0088 sll0185 tll2067 122.7320 0313 232.3142 275.2006 conserved hypothetical proteinort054 1.9188 1 alr2492 glr1370 0073 1606 502.169 135.1721 0086 slr0074 tll0490 117.3395 0319 232.4311 264.1818 ABC transporter, membrane componentort053 3.4352 1 alr2493 glr1371 0072 1605 502.171 135.1722 0085 slr0075 tlr1904 117.3396 0320 232.4312 264.1819 ABC transporter, ATP-binding componentort055 9.8770 0 alr2494 glr1372 0071 1604 502.172 135.1723 0084 slr0076 tlr1905 117.3397 0321 232.4313 362.6578 ABC transporter, membrane componentort201 4.1417 1 all4087 gll0929 0067 1596 494.47 133.1220 0080 slr1234 tlr0997 107.845 0328 232.4692 361.5735 HIT (Histidine triad) family proteinort295 4.9136 0 asl0940 gsr2010 0061 1589 501.60 135.1861 0075 ssl0353 tlr1577 121.5436 0335 232.4400 333.3352 YGGT family, conserved hypothetical integral membrane proteinort003 2.7860 1 alr0939 gll2012 0060 1588 501.176 133.1255 0074 sll0053 tlr1808 121.5934 0336 232.4399 328.3194 accC acetyl-CoA carboxylase, biotin carboxylase subunitort151 6.7278 0 alr4005 glr2712 0059 1587 504.190 133.1225 0073 slr1677 tll1550 121.5738 0337 232.682 362.6426 conserved hypothetical proteinort253 8.3556 0 alr1128 glr3322 0057 1584 479.56 135.1948 0071 sll1120 tlr1925 122.6323 0341 232.3545 282.2136 putative chromosome segregation protein, SMC ATPase superfamilyort533 3.5702 1 alr3395 glr0043 1465 1484 397.35 136.2360 1619 sll0421 tll1829 120.4569 0405 232.2313 355.4168 purB Fumarate lyase:Adenylosuccinate lyaseort383 1.9370 1 all2319 gll0256 1463 1481 475.84 136.2358 1616 ssl0707 tll0591 121.6103 0462 232.4038 361.5924 glnB nitrogen regulatory protein P-IIort152 5.6983 0 all1162 gll3593 1461 1478 501.196 135.2064 1614 slr0711 tll0218 121.5909 0463 232.3283 321.3011 conserved hypothetical protein

Online Supporting Information

Page 38: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 6

ort679 7.5593 0 alr3123 glr2021 1460 1477 469.13 136.2357 1613 slr2087 tll0683 118.3934 0471 232.2714 355.4178 ycf44 putative c-type cytochrome biogenesis protein Ccs1ort048 6.8278 0 alr3122 glr1719 1459 1476 469.12 136.2356 1612 sll0621 tlr0052 118.3933 0472 232.2713 355.4177 ccdA putative c-type cytochrome biogenesis protein CcdAort363 7.0142 0 all0154 gll3581 1458 1475 502.44 136.2355 1611 slr1267 tlr0440 122.6816 0475 232.5371 360.5218 ftsW Cell division protein FtsWort200 5.2920 0 all0906 gll3181 1464 1482 372.18 136.2732 1617 slr0950 tll1457 122.6696 0482 232.3375 114.290   hemolysin-like proteinort029 8.5178 0 all0011 gll2911 1457 1473 502.192 136.2735 1610 sll1321 tlr0429 122.6792 0488 232.1561 362.6283 atp1 possible ATP synthase protein 1ort038 2.8367 1 all0010 gll2910 1456 1472 502.193 136.2736 1609 sll1322 tlr0430 122.6793 0489 232.1560 362.6284 atpI ATP synthase subunit Aort037 1.1663 1 asl0009 gsl2909 1455 1471 502.194 136.2737 1608 ssl2615 tlr0431 122.6794 0490 232.1559 362.6285 atpH ATP synthase subunit cort036 6.3709 0 all0008 gll2908 1454 1470 502.195 136.2738 1607 sll1323 tlr0432 122.6795 0491 232.1558 362.6286 atpG putative ATP synthase subunit B'ort035 8.4030 0 all0007 gll2907 1453 1469 502.196 136.2739 1606 sll1324 tlr0433 122.6796 0492 232.1557 362.6287 atpF putative ATP synthase B chainort033 7.3539 0 all0006 gll2906 1452 1468 502.197 136.2740 1605 sll1325 tlr0434 122.6797 0493 232.1556 362.6288 atpD putative ATP synthase delta chainort030 2.2227 1 all0005 gll2905 1451 1467 502.198 136.2741 1604 sll1326 tlr0435 122.6798 0494 232.1555 362.6289 atpA ATP synthase subunit alphaort032 3.6133 1 all0004 glr4315 1450 1466 502.199 136.2742 1603 sll1327 tll0385 122.6801 0495 232.1554 362.6290 atpC ATP synthase subunit gammaort153 4.6848 1 alr0786 gll0557 1447 1463 478.98 136.2744 1601 slr1287 tlr0672 100.51 0499 232.4447 211.1101 conserved hypothetical proteinort464 6.1496 0 alr2485 glr2587 1446 1461 407.12 136.2470 1599 slr1691 tlr1180 119.4282 0502 232.4304 362.6471 nadE putative glutamine dependent NAD(+) synthetaseort034 4.6828 1 all5038 gll2568 1439 1452 431.32 131.816 1592 slr1330 tlr0526 113.2071 0511 232.1258 280.2097 atpE putative ATP synthase epsilon subunitort031 1.4946 1 all5039 gll2570 1438 1451 431.30 131.815 1591 slr1329 tlr0525 113.2070 0512 232.1259 280.2096 atpB ATP synthase beta subunitort154 3.7880 1 asl4181 glr0831 1435 1447 366.9 136.2274 1588 ssr3307 tsl1100 121.5186 0518 232.4601 277.2029 conserved hypothetical proteinort495 5.2701 1 all4182 gll0756 1434 1446 366.8 136.2273 1587 slr1945 tlr0151 121.5185 0519 232.4600 277.2028 pgmI Phosphoglycerate mutase, co-factor-independent (iPGM)ort550 5.3987 1 alr2083 glr0827 1433 1445 388.8 128.450 1586 sll0368 tll2320 119.4313 0520 232.2042 362.6742 pyrR possible pyrimidine operon regulatory protein PyrRort261 7.8110 0 alr1264 gll1676 1432 1444 474.52 136.2620 1585 slr0086 tll2215 114.2287 0523 232.4453 360.5052 putative DnaK-type molecular chaperone (HSP70 family)ort605 5.1082 1 asr4648 gsl1620 1431 1442 489.4 135.1706 1584 ssl2982 tsr2257 121.6012 0525 232.986 188.857 rpoZ putative DNA-directed RNA polymerase (omega chain)ort155 9.8678 0 alr4650 glr1399 1430 1441 489.6 135.1708 1583 slr0249 tlr1446 121.6014 0526 232.984 188.859 conserved hypothetical proteinort454 7.4248 0 alr4112 gll3732 1364 1434 507.171 135.1675 1439 sll2010 tll0073 121.5853 0530 232.4668 312.2770 murD UDP-N-acetylmuramoylalanine--D-glutamate ligaseort157 7.7086 0 alr2890 gll0391 1356 1433 420.37 133.1324 1438 sll1252 tlr2161 120.5114 0531 232.4879 309.2703 conserved hypothetical proteinort634 2.9645 1 alr1890 glr2139 1354 1431 507.265 133.1160 1436 sll1908 tlr0325 120.4551 0533 232.2652 362.6395 serA putative D-3-phosphoglycerate dehydrogenase (PGDH)ort512 9.2621 0 alr1891 glr2136 1353 1430 507.266 133.1161 1435 sll1909 tlr0326 120.4552 0534 232.2653 362.6396 prmA putative methyltransferase for Ribosomal protein L11ort158 4.0746 1 alr5152 glr4119 1349 1426 507.244 135.1971 1430 slr0050 tll0915 118.3927 0538 232.1359 276.2011 conserved hypothetical proteinort662 7.2425 0 alr1543 gll3690 1347 1422 477.121 124.129 1428 slr0457 tlr1606 112.1867 0544 232.3824 362.6585 truB putative tRNA pseudouridine 55 synthaseort590 2.5600 1 asl0146 gsl0824 1345 1421 457.24 124.128 1426 ssr2799 tsl0166 121.5643 0546 232.5365 345.3730 rpl27 50S ribosomal protein L27ort586 2.5480 1 all0147 gll0825 1344 1420 457.23 124.127 1425 slr1678 tll0167 121.5642 0547 232.5366 345.3729 rpl21 50S ribosomal protein L21ort465 4.3429 1 alr3511 gll0055 1341 1417 504.61 126.282 1422 sll0698 tlr0437 103.322 0551 232.2136 197.946 nblS two-component sensor histidine kinaseort535 6.2194 0 alr3510 glr2779 1340 1416 494.77 126.281 1421 slr1159 tlr1839 103.320 0552 232.2137 360.4899 purD phosphoribosylglycinamide synthetaseort534 5.9641 0 alr2268 glr1916 1339 1414 440.13 126.231 1420 slr1226 tll1496 122.6948 0554 232.3994 323.3060 purC SAICAR synthetaseort434 6.1640 1 alr2270 glr1865 1337 1412 440.16 126.233 1418 sll1508 tlr1790 113.2015 0556 232.3996 323.3062 lpxC UDP-3-0-acyl N-acetylglucosamine deacetylaseort352 3.3057 1 alr2271 glr1866 1336 1411 440.17 126.234 1417 sll1605 tlr1791 113.2016 0557 232.3997 310.2718 fabZ (3R)-hydroxymyristoyl-[acyl carrier protein] dehydrataseort432 4.9804 1 alr2272 glr1867 1335 1410 440.18 126.235 1416 sll0379 tlr1792 113.2017 0558 232.3998 350.3971 lpxA UDP-N-acetylglucosamine acyltransferaseort433 6.6264 0 alr2274 glr1868 1334 1409 440.19 126.236 1415 slr0015 tll0321 113.2020 0559 232.3999 350.3972 lpxB Lipid-A-disaccharide synthetaseort204 5.2804 0 alr0237 glr2611 1332 1407 506.152 124.111 1413 sll2001 tlr1745 96.8550 0562 232.1665 315.2839   Leucine aminopeptidaseort159 2.3518 1 alr2980 glr0782 1330 1404 423.49 129.572 1411 slr1915 tll2044 122.7672 0567 232.4794 305.2610 conserved hypothetical proteinort669 3.8947 1 alr2982 gll4415 1329 1403 423.51 129.573 1410 slr1031 tlr0050 122.7673 0568 232.4792 362.6746 tyrS t-RNA synthetase, class Ib:Tyrosyl-tRNA synthetaseort548 7.6277 0 alr2983 gll1658 1328 1402 423.52 129.574 1409 sll0838 tll0395 122.7675 0571 232.4791 344.3688 pyrF orotidine 5'-phosphate decarboxylaseort065 5.8252 1 all0492 gll0663 1327 1401 412.49 135.1879 1408 sll1973 tll1107 119.4099 0577 232.1834 316.2864 conserved hypothetical membrane proteinort160 8.7134 0 alr4016 gll0664 1325 0478 412.47 136.2769 1406 slr0730 tll1109 119.4097 0581 232.670 316.2862 conserved hypothetical protein

Online Supporting Information

Page 39: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 7

ort268 7.6197 0 alr3705 gll4250 1323 1398 473.95 136.2217 1404 sll1374 tlr1521 122.6726 0583 232.2482 320.2981 putative GPH family sugar transporterort419 3.1253 1 alr3832 glr2652 1494 1528 479.34 135.1965 1649 slr0744 tlr1066 120.4723 0597 232.843 309.2700 infB translation initiation factor IF-2ort161 7.4376 0 asr3830 glr2651 1493 1527 479.33 135.1964 1648 ssr1238 tsl1936 120.4722 0599 232.844 309.2699   conserved hypothetical proteinort476 2.4787 1 alr3829 glr3489 1492 1526 479.32 135.1963 1647 slr0743 tll1937 120.4721 0600 232.845 309.2698 nusA N utilization substance protein Aort162 8.1709 0 alr3828 glr3488 1491 1525 479.31 135.1962 1646 slr0742 tll1938 120.4719 0601 232.846 309.2697 conserved hypothetical protein

ort574 5.3387 0 all0888 gll0030 1489 1511 477.48 136.2210 1644 slr0194 tll1273 93.8368 0609 232.3355 265.1827

cbbI, rpiA, ppi putative ribose 5-phosphate isomerase A

ort623 5.5292 1 asr1592 gll0160 1487 1509 498.42 133.1331 1642 ssl2233 tsl1188 100.36 0611 232.3075 350.3956 rpsT 30S ribosomal protein S20ort221 5.3360 0 alr1593 gll2367 1486 1508 498.43 133.1332 1641 sll1786 tll1187 100.37 0612 232.3076 350.3957   possible deoxyribonuclease similar to TatDort603 1.5346 1 alr1594 glr2283 1485 1507 498.45 133.1333 1640 sll1787 tll0642 100.38 0613 232.3077 350.3958 rpoB DNA-directed RNA polymerase beta chainort604 4.5380 1 alr1596 glr4278 1483 1505 498.47 133.1335 1638 sll1789 tll0640 100.40 0615 232.3079 350.3959 rpoC2 RNA polymerase beta prime subunitort163 4.5235 0 all0355 gll2079 1481 1503 452.61 135.2105 1636 sll0098 tll2425 100.17 0617 232.1739 361.5814 conserved hypothetical proteinort164 6.7162 0 alr4537 gll3007 1480 1501 475.111 135.2108 1634 slr1098 tll0329 121.5533 0621 232.5257 357.4428 conserved hypothetical proteinort165 8.9004 0 asr0043 gsl2882 1479 1500 506.99 135.2109 1633 ssl1263 tsr2138 122.6509 0622 232.1586 291.2286 conserved hypothetical proteinort166 5.6156 0 alr0044 gll2881 1478 1499 506.100 135.2110 1632 sll0661 tlr2139 122.6510 0623 232.1587 291.2287 ycf35 conserved hypothetical proteinort215 5.4518 1 alr0045 gll2880 1477 1498 506.101 135.2111 1631 sll0662 tlr2140 122.6511 0624 232.1588 291.2288 possible 3Fe-4S ferredoxinort378 8.6143 0 alr3182 glr0708 1475 1496 458.52 136.2702 1629 slr0072 tll1704 117.3500 0626 232.2767 330.3242 gidB putative glucose inhibited division protein Bort259 5.8338 0 alr0489 gll2595 1467 1487 468.88 128.481 1621 slr0451 tlr0350 100.53 0636 232.1829 361.5896   putative DNA helicaseort366 8.4453 0 alr3724 glr1457 1466 1485 508.101 134.1466 1620 slr0018 tll1534 112.1819 0637 232.2455 206.1050 fumC Fumarate hydrataseort315 10.6937 0 all0455 gll0377 0389 0198 461.19 135.1782 0386 slr1879 tlr1449 118.3831 0653 232.1797 362.6590 cobI PRECORRIN-2 C20-METHYLTRANSFERASEort167 5.6269 1 all0876 gll4186 0390 0202 463.41 136.2712 0387 slr1896 tlr1934 104.409 0656 232.3344 82.141 conserved hypothetical proteinort269 3.7840 1 alr0483 gll3210 0391 0203 468.80 131.831 0388 slr1974 tlr0878 116.2932 0657 232.1824 356.4254 putative GTP-binding proteinort168 5.6993 0 asr0485 gsr0474 0393 0205 468.83 135.1936 0390 ssl0105 tsl1609 122.6868 0659 232.5570 361.5817 conserved hypothetical proteinort169 8.1606 0 alr0486 glr0702 0394 0206 468.84 135.1937 0391 slr0556 tll1037 122.6870 0660 232.1826 361.5818 conserved hypothetical proteinort171 1.9577 1 alr0487 glr0703 0395 0207 468.86 135.1938 0392 slr2073 tlr1216 122.6874 0661 232.1827 359.4834 conserved hypothetical proteinort514 7.5842 0 alr0488 glr0704 0396 0208 468.87 135.1939 0393 slr0661 tlr1217 122.6875 0662 232.1828 359.4835 proC putative pyrroline-5-carboxylate reductaseort244 4.0773 1 alr4178 gll0082 0397 0209 366.14 134.1494 0394 slr1508 tll0327 122.7159 0663 232.4604 338.3506 probable glycosyltransferaseort235 7.3444 0 alr4175 gll0084 0398 0211 366.12 134.1492 0395 slr0181 tlr2234 122.7155 0665 232.4606 202.1004   possible Recombination protein O (RecO)ort256 9.2918 0 alr4174 gll3538 0399 0212 353.8 134.1491 0396 sll1776 tlr1079 122.7154 0666 232.4607 282.2125 Putative deoxyribose-phosphate aldolaseort435 3.4978 1 all3184 glr0786 0400 0213 420.44 131.882 0397 sll0947 tlr0833 95.8470 0667 232.2769 362.6160 lrtA light repressed protein A homologort430 6.5219 0 alr3185 gll4414 0401 0214 420.14 131.840 0398 slr0994 tll0832 95.8463 0668 232.2770 362.6612 lipB putative lipoate-protein ligase Bort172 6.2089 1 alr3603 glr1484 0403 0216 344.2 126.228 0400 slr1600 tll1300 115.2573 0670 232.2085 361.5438 conserved hypothetical proteinort257 3.0754 1 alr3606 gll2569 0405 0220 344.3 134.1548 0401 sll1841 tll1299 115.2574 0671 232.2087 335.3423 Putative dihydrolipoamide acetyltransferase component (E2) of pyruvate…ort551 5.9182 0 alr1798 glr3587 0406 0223 489.32 131.837 0402 sll0467 tll0922 118.3847 0672 232.3656 300.2486 queA Queuosine biosynthesis proteinort522 1.9365 1 alr4291 glr2324 1158 1180 496.2 127.382 1255 sll0851 tlr1631 122.7730 0676 232.5100 83.143 psbC photosystem II chlorophyll-binding protein CP43ort678 5.2305 0 all4289 glr0079 1156 1178 504.113 127.321 1253 sll0226 tll1388 122.6413 0678 232.5098 362.6489 ycf4 photosystem I assembly protein (Ycf4 family)ort417 2.8518 1 alr4627 gll1136 1154 1176 478.2 125.201 1251 sll0065 tll0880 115.2593 0680 232.1006 359.4703 ilvN Acetolactate synthase small subunitort677 5.6368 0 all4752 glr3257 1152 1173 505.118 125.164 1248 slr0399 tll0859 95.8445 0683 232.894 69.88 ycf39 putative chaperon-like protein for quinone binding in photosystem IIort418 4.7731 0 asl4195 gsr0413 1151 1169 374.33 129.543 1246 ssl3441 tsr0101 99.8787 0686 232.4587 296.2398 infA translation initiation factor IF-1ort345 3.2475 1 alr4351 gll2252 1142 1161 501.168 133.1327 1236 sll0019 tlr1040 116.3018 0698 232.5157 358.4480 dxr 1-deoxy-D-xylulose 5-phosphate reductoisomeraseort329 4.7620 1 all1092 glr0851 1141 1159 452.54 135.2148 1235 slr0958 tlr1827 120.4696 0701 232.2612 356.4250 cysRS cysteinyl-tRNA synthetaseort504 5.6864 1 alr1254 gll0636 1140 1157 429.46 136.2425 1234 slr0707 tll1339 122.7460 0702 232.4462 332.3308 polA DNA polymerase I

Online Supporting Information

Page 40: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 8

ort245 3.2275 1 all0723 gll3233 1138 1155 483.89 136.2854 1232 sll0245 tll1587 121.5444 0704 232.3483 336.3426   probable GTP-binding proteinort649 5.4471 0 all4120 gll4295 1051 1143 474.32 135.1892 1150 sll0455 tll0277 122.7038 0711 232.4658 362.6617 thrA homoserine dehydrogenaseort173 8.3490 0 all4825 glr0759 1053 1138 484.89 128.440 1148 sll1643 tll0980 119.4298 0714 232.1066 103.241 conserved hypothetical proteinort628 9.8279 1 all2297 gll3300 1054 1136 481.38 132.1098 1147 sll0896 tlr1137 119.4463 0715 232.4017 361.5674 ruvC Holliday junction resolvase RuvCort301 2.8155 1 all0152 glr1714 1055 1135 506.268 136.2868 1146 slr1030 tll1511 119.4459 0716 232.5369 362.6292 chlI Protoporphyrin IX Magnesium-chelatase subunit ChlIort249 6.2356 0 alr4542 glr2140 1056 1132 348.20 135.1633 1144 slr0120 tll2116 120.4823 0719 232.5260 310.2722   probable tRNA/rRNA methyltransferaseort330 6.5083 0 all1365 gll2782 1057 1131 498.50 133.1173 1143 sll1245 tll2429 109.1123 0720 232.2899 361.5608 cytM cytochrome cMort491 2.2649 1 asr1366 gsl0511 1058 1130 498.112 133.1310 1142 smr0010 tsr0877 109.1189 0721 232.5612 361.5636 petG Cytochrome b6f complex subunit 5ort175 8.0009 0 all1367 gll2930 1059 1129 498.49 133.1172 1141 slr0383 tlr1053 109.1121 0722 232.2898 339.3528   conserved hypothetical proteinort409 7.5469 0 all1368 glr0407 1060 1128 498.48 133.1171 1140 slr0084 tlr1536 109.1120 0723 232.2897 362.6518 hisH Glutamine amidotransferase class-Iort663 2.0190 1 alr0052 gll0880 1061 1127 493.78 135.2074 1139 slr0623 tll0812 113.2011 0724 232.1595 207.1057 trxA Thioredoxinort395 1.7450 1 alr0051 glr1603 1062 1126 493.77 135.2073 1138 slr1722 tll2190 113.2003 0725 232.1594 359.4808 guaB putative IMP dehydrogenaseort396 2.9578 1 all0860 gll3082 1063 1124 424.29 136.2694 1137 slr0417 tlr1369 120.5100 0727 232.3329 321.3012 gyrA DNA gyrase A subunitort176 6.3910 0 alr2450 glr4151 1065 1122 506.176 135.2041 1135 sll0735 tll1974 122.7759 0729 232.4270 340.3569 conserved hypothetical proteinort425 2.5893 1 alr4840 gll2290 1066 1121 505.188 133.1210 1134 slr0186 tll1397 122.6760 0730 232.1082 276.2012 leuA 2-isopropylmalate synthaseort357 4.6210 0 alr0212 gll1884 1069 1110 504.198 132.1115 1130 sll0753 tll0021 120.5087 0740 232.1641 361.5626 folD methylenetetrahydrofolate dehydrogenaseort324 4.4641 1 alr0213 gll0416 1070 1109 504.199 132.1116 1129 slr0739 tll0020 120.5088 0742 232.1642 268.1877 crtE geranylgeranyl pyrophosphate synthaseort045 6.8123 0 alr3934 glr3695 1072 1104 505.210 135.1784 1126 sll1501 tlr1848 122.7733 0748 232.748 76.122 cbiA putative Cobyrinic acid a,c-diamide synthaseort682 4.2712 1 all4019 glr3178 1074 1102 507.156 131.893 1124 slr1843 tll0540 118.3670 0750 232.666 324.3080 zwf glucose-6-phosphate dehydrogenaseort492 3.5522 1 all4121 gll2295 1075 1101 474.31 134.1488 1123 slr1643 tlr1211 119.4027 0751 232.4657 343.3658 petH ferredoxin--NADP reductase (FNR)ort496 8.8654 0 all4063 glr0301 0798 0518 508.85 135.1770 0874 sll1522 tll1275 116.2896 0771 232.628 357.4422 pgsA CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyltransferaseort405 5.7866 0 all4506 glr0706 0796 0520 455.50 135.1773 0872 slr0652 tll0216 121.5431 0773 232.2276 362.6327 hisA phosphoribosylformimino-5-aminoimidazole carboxamide ribotide...ort178 8.2150 0 all2716 glr0793 0794 0523 466.40 129.593 0870 slr1796 tlr1421 121.5725 0776 232.3153 212.1113 conserved hypothetical proteinort179 7.9997 0 alr3100 gll1132 0791 0526 509.282 125.183 0867 sll1547 tll2329 117.3230 0779 232.2694 292.2320   conserved hypothetical proteinort180 8.5324 1 alr3101 gll1131 0790 0527 509.283 125.184 0866 sll1109 tll1867 117.3232 0780 232.2695 300.2498 conserved hypothetical proteinort181 10.3178 0 alr3102 gll1129 0789 0528 509.285 125.185 0865 slr0362 tll0495 117.3234 0781 232.2696 71.95 conserved hypothetical proteinort513 6.4096 0 alr3103 gll0069 0788 0529 509.286 130.682 0864 slr2035 tll2118 110.1319 0782 232.2697 324.3087 proB putative glutamate 5-kinaseort426 3.7672 1 alr1313 gll3551 0786 0531 497.43 133.1159 0862 slr1517 tlr1600 117.3435 0784 232.1911 361.5260 leuB 3-isopropylmalate dehydrogenaseort004 3.6739 1 all2364 glr1605 0784 0534 497.48 135.1833 0859 sll0336 tlr1643 122.6621 0788 232.4084 362.6124 accD acetyl-CoA carboxylase, beta subunit

ort044 2.1308 1 all4563 gll0752 0781 0537 493.70 133.1293 0855 sll0018 tlr0376 120.4621 0791 232.1445 332.3318

cbbA, cfxA, fbaA, fda Fructose-bisphosphate/sedoheptulose-1,7-bisphosphate alsolase

ort542 4.5633 0 alr2475 glr1325 0780 0539 465.67 132.1010 0854 slr0520 tlr2187 110.1374 0793 232.4294 362.6487 purQ phosphoribosylformylglycinamidine synthaseort182 5.4837 0 asr2474 gsl0442 0779 0540 465.66 132.1009 0853 slr0519 tsr2185 110.1373 0794 232.4293 362.6486 conserved hypothetical proteinort332 3.6056 1 alr2542 gll1206 0832 0814 498.2 130.766 0904 sll1058 tll1987 119.3993 0819 232.4360 360.4914 dapB dihydrodipicolinate reductaseort300 2.4003 1 all4365 gll2622 0831 0815 440.25 130.649 0903 slr1055 tlr0271 121.5660 0820 232.2217 357.4373 chlH Protoporphyrin IX Magnesium chelatase subunit chlHort359 8.2432 0 alr4386 glr0540 0830 0816 501.135 131.808 0902 slr2026 tlr1522 112.1727 0822 232.2195 205.1038 folP putative dihydropteroate synthase

ort654 4.8826 1 alr4385 gll1040 0829 0817 501.134 128.519 0901 slr0783 tlr0966 112.1726 0823 232.2196 205.1037cbbJ, tpiA triosephosphate isomerase

ort057 6.1935 0 all0640 gll2329 0827 0819 479.58 136.2606 0899 sll1276 tlr0962 120.5139 0825 232.3432 321.3002 ABC transporter, possibly multidrug effluxort183 8.0607 0 alr4979 gll3597 1170 0821 399.18 136.2554 1181 slr1900 tll0916 113.2078 0829 232.1217 128.365 conserved hypothetical proteinort043 3.3436 1 alr3809 gll1769 0825 0822 479.11 121.7 0897 sll0370 tll2332 104.406 0830 232.868 361.5742 carB carbamoyl phosphate synthetase, large subunitort184 6.4854 1 alr4169 gsl0787 0823 0824 353.3 136.2648 0895 slr1886 tlr1222 122.7175 0832 232.4612 359.4813 conserved hypothetical protein

Online Supporting Information

Page 41: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 9

ort185 4.9317 1 alr4171 gll2939 0821 0825 353.5 136.2650 0893 sll0585 tll1062 122.7177 0834 232.4610 359.4815 conserved hypothetical proteinort367 4.8449 0 all1691 glr3304 0637 0858 463.18 134.1479 0794 sll0567 tll0048 110.1378 0865 232.5028 360.4935 fur Ferric uptake regulator familyort197 3.0645 1 alr0799 glr3340 1111 1095 443.28 134.1390 1085 slr1846 tll0874 118.3891 0906 232.4471 247.1570 glutaredoxin-like proteinort061 5.3822 0 asr0798 gsr4417 1110 1094 443.27 134.1389 1086 ssr3122 tsl0875 118.3890 0907 232.4470 247.1569 BolA family proteinort187 7.2858 0 all4869 gll3437 1109 1093 443.26 124.110 1087 slr1160 tll0135 118.3889 0908 232.1108 224.1263 conserved hypothetical proteinort485 4.6257 0 all0218 gll0103 1107 1091 504.10 124.118 1089 slr1779 tlr0838 120.4563 0910 232.1647 324.3095 pdxJ Pyridoxal phosphate biosynthetic protein PdxJort049 4.8613 1 all2146 gll4135 1099 1080 474.91 135.1906 1096 slr2019 tlr0577 116.3092 0924 232.1937 281.2107 ABC transporter, multidrug efflux familyort560 2.5127 1 alr4977 glr3499 1097 1077 505.41 133.1251 1098 slr1426 tll2204 103.384 0928 232.1215 359.4760 recR RecR proteinort039 8.9380 0 alr1921 glr4328 1093 1056 479.62 136.2305 1102 slr1364 tll0661 106.684 0933 232.3256 360.4910 bioB biotin synthaseort671 5.3414 0 all2995 gll0108 1092 1057 489.65 136.2700 1103 sll0506 tlr1763 98.8680 0934 232.4781 356.4269 uppS Undecaprenyl pyrophosphate synthetase (UPPS)ort188 5.7140 1 all2996 gll0109 1091 1058 489.64 136.2699 1104 sll0505 tlr1762 98.8679 0935 232.4780 356.4268 conserved hypothetical proteinort436 4.9091 1 all2997 gll0110 1090 1059 489.62 136.2698 1105 sll0504 tlr1761 98.8675 0936 232.4779 356.4267 lysA diaminopimelate decarboxylaseort282 8.8799 0 alr2998 glr2063 1089 1060 489.55 136.2385 1106 slr0853 tll0471 98.8733 0937 232.4778 268.1879 putative ribosomal-protein-alanine acetyltransferaseort305 0.9607 1 alr2999 glr2064 1088 1061 489.56 136.2386 1107 sll0020 tll0307 98.8735 0938 232.4777 361.5366 clpC endopeptidase Clp ATP-binding chain Cort265 6.6372 0 alr2764 gll1456 1087 1062 492.34 131.915 1108 slr1167 tll2051 121.6207 0939 232.3920 356.4294 putative glycerol dehydrogenaseort280 5.4243 0 all3875 gll1396 1085 1065 432.43 136.2320 1111 slr1369 tll2108 112.1671 0942 232.796 312.2764 putative phosphatidate cytidylyltransferaseort041 9.1914 0 all4887 gll3487 1084 1066 506.205 127.299 1112 sll1683 tlr1866 119.4089 0944 232.1126 268.1880 cad Orn/Lys/Arg decarboxylases family 1ort625 7.0924 0 all3873 glr4397 0658 1069 432.44 134.1463 1114 slr0361 tlr0656 112.1672 0949 232.798 362.6561 rsuA putative pseudouridylate synthase specific to ribosomal small subunitort440 7.6179 0 alr3871 gll3316 1081 1071 432.17 134.1505 1116 sll1676 tlr0708 112.1842 0962 232.800 362.6305 malQ Putative 4-alpha-glucanotransferaseort423 2.0589 1 alr4670 gll0901 1080 1074 422.30 130.641 1118 sll0469 tlr1546 118.3586 0966 232.965 337.3476 kprS Ribose-phosphate pyrophosphokinaseort266 3.9430 0 alr1879 glr0932 0609 0403 402.16 129.604 1052 sll0945 tll0763 109.1118 1000 232.3630 227.1297 Putative glycogen synthase (glgA)ort456 7.9238 0 all0036 glr2398 0610 0401 505.89 135.2066 1051 slr1351 tlr1553 122.6421 1002 232.1578 353.4119 murF Cytoplasmic peptidoglycan synthetases, N-terminal:Cytoplasmic...ort381 6.0105 0 alr3921 glr0443 0611 0400 460.13 136.2373 1050 sll0899 tlr0393 122.6680 1003 232.756 362.6644 glmU UDP-N-acetylglucosamine pyrophosphorylaseort066 7.6070 0 alr3919 gll1313 0612 0399 460.10 135.1823 1049 sll0983 tll1622 122.6679 1004 232.758 360.4985 conserved hypothetical proteinort020 11.5504 0 all5019 gll1038 0613 0398 480.22 136.2402 1048 slr0444 tlr0343 122.7271 1005 232.1221 129.373 aroA EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase)ort242 5.6520 0 all2568 gll2974 0614 0396 441.45 136.2557 1046 slr1718 tll0403 80.8040 1007 232.4381 360.5038 probable 2-phosphosulfolactate phosphataseort233 3.6327 1 alr2001 glr2943 0615 0395 493.79 131.841 1045 sll0601 tll0920 116.2875 1008 232.3898 353.4111 Possible nitrilaseort458 7.6720 0 alr0094 gll1496 0617 0393 444.48 131.844 1043 slr1746 tlr0461 121.6057 1010 232.5318 360.5050 murI Putative Aspartate and glutamate racemases:Glutamate racemaseort629 4.3042 1 alr0096 gll0753 0618 0392 436.4 131.845 1042 slr0611 tlr1757 121.5331 1011 232.5319 360.5051 sds polyprenyl synthetase; solanesyl diphosphate synthase (sds)ort006 3.0815 1 all4257 gll0159 0619 0389 462.4 128.472 1041 sll0542 tll0887 115.2420 1013 232.5064 361.5663 acs acetyl-coenzyme A synthetaseort056 6.0807 0 alr4715 gll3672 0954 0655 505.57 127.373 0742 slr0615 tlr0655 117.3434 1023 232.927 329.3227 ABC transporter, multidrug efflux familyort067 7.5864 0 asl1962 gsl3751 0953 1037 466.22 131.819 0743 ssr2551 tsr1432 121.5982 1024 232.3218 340.3568 conserved hypothetical proteinort042 4.3204 1 alr1155 gll3543 0951 0653 477.2 130.776 0745 sll1498 tll0760 112.1859 1026 232.3277 360.5089 carA carbamoyl-phosphate synthase small chainort068 8.0286 0 alr1158 glr4180 0949 0651 477.5 130.778 0747 slr0954 tlr0428 112.1862 1028 232.3279 301.2517 conserved hypothetical proteinort239 5.1483 0 alr1159 glr4181 0948 0650 477.6 130.779 0748 slr0955 tll1042 112.1863 1029 232.3280 301.2518   possible tRNA/rRNA methyltransferaseort370 4.1255 1 all1053 gll1528 0946 0648 480.85 130.781 0750 slr0877 tll1003 114.2352 1031 232.2640 330.3245 gatA glutamyl-tRNA (Gln) amidotransferase subunit Aort339 3.3715 1 all3578 glr3934 0945 0647 502.210 130.734 0751 slr0603 tll2056 119.4212 1032 232.5452 358.4486 dnaE DNA polymerase III, alpha subunitort069 8.5950 0 all0748 glr1740 0944 0646 472.34 131.805 0752 sll0933 tlr1208 121.5340 1033 232.3508 270.1913 conserved hypothetical proteinort611 4.3506 1 asl0749 gsl4026 0943 0645 472.33 131.804 0753 ssl1784 tsr1207 121.5339 1034 232.3509 362.6736 rps15 30S ribosomal protein S15ort340 5.6077 0 all1323 gll4083 0939 0641 466.16 128.497 0757 sll1868 tlr1317 118.3845 1038 232.2681 260.1755 dnaG putative DNA primaseort283 6.2624 0 alr3229 gll1991 0933 0621 502.1 136.2363 0763 sll0708 tlr0659 118.3872 1052 232.3733 320.2971   putative rRNA (adenine-N6,N6)-dimethyltransferaseort422 7.2306 0 alr3230 gll0102 0932 0620 502.2 136.2364 0764 sll0711 tll0500 122.6913 1053 232.3734 295.2379 ispE Putative 4-diphosphocytidyl-2C-methyl-D-erythritol kinase (CMK)ort631 4.1149 1 all0121 glr4166 0929 0617 370.26 136.2451 0767 slr0774 tll0203 97.8587 1056 232.5340 352.4038 secD preprotein translocase subunit

Online Supporting Information

Page 42: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 10

ort632 6.9207 0 all0120 glr4167 0928 0616 370.27 136.2452 0768 slr0775 tll0202 97.8588 1057 232.5339 352.4039 secF preprotein translocase subunitort070 7.1654 0 alr3655 glr2407 0923 0607 430.13 133.1196 0774 sll1968 tlr1461 122.7363 1068 232.2531 309.2711 conserved hypothetical proteinort382 2.7930 1 alr2328 glr1052 0920 0601 481.19 130.757 1038 slr1756 tll1588 122.7439 1073 232.4048 361.5271 glnA Glutamine synthetase, glutamate--ammonia ligaseort192 6.0516 0 all4872 gll1053 0918 0599 443.24 135.1785 1036 sll1631 tlr0177 120.5015 1076 232.1111 342.3652 cytidine/deoxycytidylate deaminase family proteinort273 8.8208 0 alr4577 glr0072 0915 0595 430.2 135.2122 1032 slr1366 tlr2420 106.687 1079 232.1430 360.4912 putative lipoprotein signal peptidaseort071 9.2089 0 alr4576 glr4030 0914 0594 430.1 135.2121 1031 slr1365 tll0901 106.686 1080 232.1431 360.4911 conserved hypothetical proteinort072 5.9052 0 asl2557 gsr1914 0744 0593 494.69 124.120 1030 ssl3451 tsl2428 104.417 1081 232.4371 231.1346 conserved hypothetical proteinort482 7.5546 0 alr3707 glr2589 0747 0590 493.86 124.87 0819 slr0116 tll2308 120.5073 1084 232.2484 311.2748 pcyA ferredoxin-dependent biliverdin reductaseort225 4.3031 0 all4793 gll2797 0752 0585 493.62 135.1925 0824 sll1664 tll0992 114.2275 1089 232.1035 231.1344   possible glycosyltransferaseort616 2.2408 1 all4792 gll1830 0753 0584 493.94 129.552 0825 sll1260 tll1688 101.96 1090 232.1032 355.4199 rps2 30S ribosomal protein S2ort664 3.2377 1 all4791 gll1829 0754 0583 493.95 129.553 0826 sll1261 tll1687 101.97 1091 232.1031 355.4200 tsf putative elongation factor EF-Tsort558 5.5049 0 all4789 gll1278 0756 0581 493.98 134.1552 0828 slr0020 tll0735 101.99 1093 232.1029 257.1720 recG ATP-dependent DNA helicaseort255 7.1035 0 all4116 gll1805 0757 0580 506.66 136.2795 0829 slr1679 tll2463 122.6782 1094 232.4662 234.1382 putative d-alanyl-d-alanine dipeptidaseort637 4.7762 1 alr1348 glr1848 0758 0579 448.44 136.2582 0830 slr0963 tlr0339 88.8199 1095 232.3880 329.3229 sir Ferredoxin-sulfite reductaseort390 7.9863 0 alr4111 glr1646 0759 0578 507.170 136.2522 0831 slr0220 tll2192 121.5852 1096 232.4669 349.3913 glyS putative Glycyl-tRNA synthetase, beta subunitort304 2.1261 1 alr0128 glr4377 0760 0577 504.170 136.2326 0832 sll1091 tll0150 115.2471 1097 232.5347 225.1268 chlP geranylgeranyl hydrogenaseort667 1.9233 1 all4140 gll1520 0762 0575 389.23 133.1336 0834 slr1105 tlr2283 122.7918 1109 232.4639 361.5888 typA tyrosine binding proteinort217 4.7233 0 alr4068 glr3264 0764 0572 477.41 136.2245 0836 slr0251 tll1571 122.6747 1112 232.623 324.3103   possible ABC transporter, ATP-binding componentort226 4.3761 1 all0936 glr2022 0765 0570 501.67 136.2247 0838 sll1513 tlr1615 118.3622 1114 232.4396 289.2253 possible heme transporter

ort573 5.0177 1 alr0782 gll3548 0766 0569 464.41 127.350 0839 sll0807 tll2369 122.6539 1115 232.3533 359.4803

cbbE, rpe, ppE ribulose-5-phosphate 3-epimerase

ort384 2.9450 1 alr1041 glr3342 0767 0568 443.51 136.2249 0840 slr2094 tll1276 119.4322 1116 232.2593 345.3766

cbbF, glpX, fbp Fructose-1,6-bisphosphatase/sedoheptulose-1,7-bis phosphatase

ort398 4.9077 1 alr1042 glr1218 0768 0567 443.53 136.2250 0841 slr1808 tll1738 119.4324 1117 232.2595 352.4040 hemA Possible glutamyl-tRNA reductaseort008 3.3229 1 all4645 gll4260 0769 0566 457.46 127.345 0842 slr1176 tlr1287 117.3427 1118 232.991 221.1219 agp ADP-glucose pyrophosphorylaseort391 4.5135 1 alr5275 gll1117 0770 0565 506.7 136.2594 0843 sll0329 tlr0576 115.2577 1119 232.1480 339.3533 gnd 6-phosphogluconate dehydrogenaseort251 9.5890 0 alr1602 glr3180 0771 0564 397.26 136.2837 0844 sll1479 tlr1076 122.6404 1120 232.3085 314.2823 Putative 6-phosphogluconolactonase, Pgl, DevBort287 7.1471 0 all2063 gll0063 0776 0558 454.26 135.2130 0849 sll1035 tlr1764 109.1072 1125 232.2026 326.3147 putative uracil phosphoribosyltransferaseort322 5.6617 0 all3392 gll0123 0778 0556 400.30 132.1090 0851 slr0502 tll1624 108.883 1127 232.2310 357.4366 cobW putative cobalamin synthesis proteinort510 3.2001 1 all4379 glr3159 0663 0791 370.7 131.846 1029 slr1228 tll2205 104.464 1149 232.2204 362.6535 prfC peptide-chain-release factor RF-3ort073 10.1062 0 all3259 glr1039 0664 0789 505.36 136.2857 1028 slr1918 tlr2348 113.2047 1152 232.3759 305.2620 conserved hypothetical proteinort074 4.6868 0 alr2014 glr1395 0667 0787 458.41 136.2859 1025 slr1195 tlr0208 121.5979 1155 232.3911 211.1104 conserved hypothetical proteinort075 11.1679 0 alr1349 gll4381 0668 0786 448.46 132.1000 1024 slr0722 tlr1746 88.8200 1156 232.3879 223.1242 conserved hypothetical proteinort076 6.2868 1 alr1612 gll0900 0669 0785 430.23 136.2387 1023 sll0875 tlr2384 94.8421 1158 232.3093 255.1691 conserved hypothetical proteinort027 5.8281 0 alr4853 glr2600 0674 0779 450.3 129.557 1018 sll0402 tll2357 116.2825 1171 232.1098 286.2196 aspC Aminotransferases class-Iort373 2.2365 1 all2501 gll3622 0676 0777 492.100 121.2 1015 slr2136 tlr0996 121.5491 1174 232.4319 197.949 gcpE GcpE proteinort062 4.7740 1 all2500 gll3102 0677 0776 492.102 121.3 1014 slr1751 tll1406 121.5498 1175 232.4318 359.4789 carboxyl-terminal processing proteaseort292 3.9710 1 all5215 gll3837 0839 0775 493.138 128.454 1000 sll0377 tll0952 111.1522 1177 232.1703 360.4970 Transcriptional-repair coupling factorort354 6.5069 0 alr3460 glr1831 0841 0772 504.126 135.1910 0997 slr0070 tlr1874 111.1641 1195 232.2384 205.1044 fmt putative methionyl-tRNA formyltransferaseort681 9.0545 0 alr1240 gll1448 0847 0766 462.61 135.2024 0988 sll1910 tlr1873 122.6355 1202 232.4430 265.1835 zam putative acetazolamide conferring resistance protein Zamort077 7.3219 0 alr5129 glr0972 0850 0763 435.38 136.2872 0985 slr0022 tlr0653 112.1656 1205 232.1333 345.3746 conserved hypothetical proteinort594 3.7495 1 asl3674 gsl0792 0853 0757 448.15 134.1472 0981 ssr1736 tsr1915 118.3703 1210 232.2512 233.1376 rpl32 50S ribosomal protein L32

Online Supporting Information

Page 43: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 11

ort655 2.5827 0 alr4641 gll3158 0856 0754 346.9 131.905 0978 sll0755 tll1454 113.1974 1213 232.995 327.3165 tpx thioredoxin peroxidaseort286 5.8235 0 alr3352 gll0699 0862 0750 346.5 135.2052 0975 slr0992 tlr1793 113.1971 1215 232.2105 315.2859   putative tRNA/rRNA methyltransferase (SpoU family)ort321 8.1273 0 all3174 glr0747 0863 0749 405.44 134.1476 0974 slr0216 tll1104 121.5389 1216 232.2759 313.2790 cobU putative cobinamide kinaseort078 7.7510 0 all3821 gll4313 0865 0745 479.120 135.2145 0972 slr0948 tll0685 119.4163 1220 232.854 86.156 conserved hypothetical proteinort443 5.2283 0 all0233 gll3023 0867 0743 506.108 134.1587 0970 slr0649 tlr0750 115.2487 1222 232.1661 361.5541 metG putative methionyl-tRNA synthetaseort248 8.5050 0 all4450 gll1616 0868 0741 498.149 134.1578 0969 sll1290 tlr0652 121.5958 1224 232.5178 284.2160   probable ribonuclease IIort613 1.7582 1 asl4451 gsl3059 0869 0740 498.147 134.1577 0968 ssr1399 tsr2061 121.5957 1225 232.5177 284.2159 rps18 30S ribosomal protein S18ort595 2.0761 1 asl4452 gsl3060 0870 0739 498.146 134.1576 0967 ssr1398 tsr2060 121.5956 1226 232.5176 284.2158 rpl33 50S ribosomal protein L33ort499 6.5051 0 alr4958 gll2926 0871 0738 476.101 128.496 0966 sll1553 tll0662 114.2139 1227 232.1183 350.3932 pheT Phenylalanyl-tRNA synthetase beta subunitort318 5.6170 0 alr1689 glr3076 0879 0727 463.61 136.2474 0957 slr1211 tlr0900 121.6082 1240 232.5030 362.6634 cobN cobalamin biosynthetic protein CobNort674 4.3774 1 alr2728 gll3592 0882 0724 452.22 135.2008 0954 sll0865 tlr1753 113.1949 1246 232.3164 334.3374 uvrC Excinuclease ABC subunit C (UvrC)ort309 5.0622 1 alr4700 glr0847 0884 0722 452.25 135.1973 0952 slr0847 tlr1165 122.7854 1248 232.938 345.3716 coaD putative pantetheine-phosphate adenylyltransferaseort333 7.0569 0 alr2048 gll0815 0888 0716 507.259 135.1813 0948 slr1665 tll2294 101.164 1254 232.2005 362.6328 dapF diaminopimelate epimeraseort428 4.3385 1 alr3283 gll4081 0889 0715 505.68 135.2019 0947 sll1074 tll2098 111.1482 1255 232.3779 342.3636 leuS leucyl-tRNA synthetaseort493 9.9771 0 alr1050 gll1692 0890 0714 443.58 135.1867 0946 slr1349 tll0717 120.4957 1256 232.2602 354.4146 pgi glucose-6-phosphate isomeraseort541 6.2372 0 all0788 glr1401 0891 0710 490.7 126.262 0945 slr0477 tlr2326 122.6973 1261 232.4445 360.4940 purN phosphoribosylglycinamide formyltransferaseort013 4.8173 0 alr3537 glr4010 0892 0709 484.106 133.1289 0944 sll0080 tll2219 121.5234 1262 232.2416 361.5589 argC N-acetyl-gamma-glutamyl-phosphate reductaseort561 3.9281 1 all3536 glr0988 0893 0708 484.30 133.1201 0943 sll1894 tlr1727 122.6257 1263 232.2415 355.4172 ribA possible GTP cyclohydrolase IIort246 4.6797 1 alr4054 gll2424 0301 1900 393.18 126.283 0332 sll0135 tlr1281 119.4415 1265 232.638 361.5857 probable methylthioadenosine phosphorylaseort079 5.8593 0 alr2431 glr1664 0895 0705 489.37 136.2549 0940 sll0860 tll0315 116.2789 1267 232.4135 311.2740 conserved hypothetical proteinort490 4.6015 1 all2919 gll3182 0898 2195 477.92 136.2552 0993 sll1382 tlr1656 118.3729 1274 232.4848 359.4700 petF2 Ferredoxinort203 5.9978 0 all2917 glr1322 0899 0699 477.94 136.2553 0937 sll1383 tlr1657 118.3731 1275 232.4850 359.4701 Inositol monophosphatase family proteinort413 4.3030 1 alr2323 glr1814 0901 0696 484.114 135.2084 0934 sll0430 tll1191 120.4595 1278 232.4042 344.3711 htpG heat shock protein HtpGort591 3.5379 1 asl2630 gsr3838 0902 0695 484.115 136.2514 0933 ssr1604 tlr2462 120.4598 1279 232.4043 353.4096 rpl28 50S ribosomal protein L28ort058 5.9866 0 all2556 glr3389 0903 1012 494.70 127.327 0932 sll0221 tlr1194 122.7491 1280 232.4370 361.5651 Alkyl hydroperoxide reductaseort080 7.6393 0 all4508 glr1876 0904 0689 455.48 133.1148 0931 sll0608 tlr1437 111.1627 1288 232.2278 357.4426 conserved hypothetical proteinort081 6.8549 0 asl4507 gsl4025 0905 0688 455.49 126.285 0930 sll0609 tsr2284 111.1628 1289 232.2277 235.1397 conserved hypothetical proteinort346 2.5721 1 alr0599 gll0194 0907 0685 458.24 136.2793 0928 sll1945 tll0623 119.4368 1292 232.3394 362.6606 dxs 1-deoxy-D-xylulose 5-phosphate synthaseort082 5.5161 0 all5166 glr1858 0909 0683 498.70 136.2300 0926 slr1577 tlr0846 119.4297 1294 232.1368 360.5148   conserved hypothetical proteinort064 4.7155 1 asl2061 gsr3540 0910 0682 454.28 136.2269 0925 ssr2142 tsr0534 109.1074 1295 232.2024 300.2505 conserved hypothetical membrane proteinort544 3.7078 1 all4008 gll3495 0912 0679 412.28 136.2473 0923 sll1275 tll2275 100.14 1298 232.677 362.6560 pykF pyruvate kinaseort083 8.4322 0 all0646 glr1400 0741 0675 378.2 133.1358 0919 sll1414 tlr1134 101.162 1302 232.3437 360.4897 conserved hypothetical proteinort538 7.3409 0 alr2742 gll1252 0683 0669 505.195 133.1286 1008 sll0578 tlr1258 103.311 1308 232.3177 342.3651 purK phosphoribosylaminoimidazole carboxylaseort021 11.5504 0 alr1924 glr2513 0682 0667 479.63 136.2238 1009 slr2130 tlr0784 118.3785 1310 232.3254 319.2950 aroB 3-dehydroquinate synthaseort084 9.1356 0 alr3357 gll2384 0681 0666 498.37 134.1564 1010 slr0351 tlr0417 120.4498 1311 232.2116 360.5174 conserved hypothetical proteinort085 4.6858 0 all2545 gll4395 0680 0665 498.172 134.1427 1011 sll1071 tlr0249 119.4464 1312 232.4363 202.1000 conserved hypothetical proteinort461 4.0298 1 all4673 gll0192 0678 0661 382.10 133.1295 1013 sll0622 tll0231 105.572 1316 232.962 361.5797 nadA Quinolinate synthetase A proteinort086 7.1266 0 alr2751 gll3484 0801 0506 506.145 136.2253 0879 sll1218 tll1029 122.7780 1336 232.3186 307.2653 conserved hypothetical proteinort262 8.2632 0 all3970 gll0656 0802 0504 357.12 136.2290 0880 slr1822 tll1641 121.5778 1342 232.712 329.3216   putative endonucleaseort660 3.5967 1 all1269 glr3011 0598 0422 492.55 128.444 1063 slr1884 tlr1996 122.6917 1475 232.1917 361.5475 trpS t-RNA synthetase, class Ib:Tryptophanyl-tRNA synthetaseort264 7.6843 0 alr2973 gll1169 0596 0425 492.50 136.2677 1065 sll0593 tll1495 116.3032 1480 232.4800 324.3086 Putative glucokinaseort650 7.2256 1 alr0346 gll1127 0595 0426 348.5 133.1194 1066 sll1760 tlr1765 121.6205 1481 232.1729 359.4740 thrB putative homoserine kinaseort355 9.4604 0 alr3512 glr1487 0591 0435 452.71 136.2595 1070 slr1626 tll0747 118.3884 1485 232.2135 353.4121 folB possible dihydroneopterin aldolase

Online Supporting Information

Page 44: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 12

ort088 8.8310 0 alr2762 gll4416 0588 0439 492.31 135.2095 1074 sll1702 tll1092 120.4700 1489 232.3919 258.1727 conserved hypothetical proteinort379 3.4350 1 all0713 glr1306 0584 0444 459.48 134.1559 1078 sll0158 tll0578 115.2456 1494 232.3475 180.783 glgB 1,4-alpha-glucan branching enzymeort400 3.1599 1 all3909 gll3877 0583 0445 508.90 134.1560 1079 slr0536 tlr0741 111.1595 1495 232.768 358.4517 hemE Uroporphyrinogen decarboxylase (URO-D)ort089 7.2057 0 all3908 glr2049 0582 0446 508.91 134.1561 1080 sll0456 tlr0742 111.1597 1496 232.769 261.1772 conserved hypothetical proteinort410 3.5568 1 all3263 glr1211 0578 0453 505.33 135.1979 0582 slr0608 tll0233 121.6192 1505 232.3763 266.1852 hisIE histidine biosynthesis bifunctional protein HisIEort090 9.0366 0 alr4810 gsr3242 0571 0466 415.4 136.2308 0573 slr0815 tlr0707 121.5526 1520 232.1052 147.506 conserved hypothetical proteinort338 3.6201 1 alr2009 gll1473 0565 1187 415.9 134.1419 0567 sll0848 tlr0603 97.8646 1536 232.3906 245.1540 dnaA chromosomal replication initiator protein DnaAort276 5.1709 1 alr1978 gll2868 0561 1192 421.26 132.1111 0563 slr1149 tll0559 112.1856 1541 232.3211 359.4743 putative multidrug efflux ABC transporterort220 5.6858 0 alr1965 glr3383 0560 1193 483.32 132.1110 0562 sll0900 tll0180 118.3572 1542 232.3215 361.5365 possible ATP phosphoribosyltransferaseort270 7.6650 0 all0580 gll3339 0559 1194 497.164 133.1213 0561 sll1019 tll1639 117.3507 1543 232.4218 357.4363 Putative hydroxyacylglutathione hydrolaseort015 4.8353 0 alr4907 gll3101 1263 0379 475.90 129.545 1337 sll0902 tll1106 112.1696 1586 232.1162 357.4406 argF ornithine carbamoyltransferaseort361 3.8031 1 all4936 gll3141 1264 0378 391.13 128.486 1338 sll1463 tlr0528 106.637 1587 232.1196 251.1628 ftsH3 cell division protein FtsH3ort562 6.4254 0 all0082 gll1250 1265 0375 415.14 136.2416 1339 slr0066 tlr2152 111.1559 1590 232.5308 233.1372 ribD riboflavin biosynthesis proteinort316 11.7503 0 all3722 gll0378 1268 0370 490.103 135.2060 1342 sll0099 tll0172 107.823 1596 232.2457 258.1729 cobL putative precorrin-6y methylaseort240 4.1980 1 all4751 gll3525 1269 0368 505.119 131.809 1343 slr0400 tll0858 95.8446 1599 232.895 69.89 predicted sugar kinaseort498 4.1614 1 all4845 glr3128 1270 0367 427.15 135.1880 1344 sll0454 tlr0690 106.635 1600 232.1086 305.2609 pheS Phenylalanyl-tRNA synthetase:Aminoacyl tRNA synthetase class ...ort642 5.1686 1 alr4846 gll0444 1271 0366 427.33 135.1934 1345 sll1108 tll1786 106.737 1601 232.1087 300.2497 surE Survival protein SurEort564 7.5409 0 alr4848 gll0217 1273 0364 427.36 136.2261 1347 slr1882 tll1556 119.4305 1603 232.1089 362.6378 ribF putative riboflavin kinase/FAD synthaseort646 7.1418 0 alr1343 gll0403 1274 0363 507.192 134.1536 1348 sll0635 tlr1947 122.6515 1604 232.3884 346.3783 thiE Thiamine monophosphate synthase (TMP)ort193 7.0170 0 asr1344 gsl0402 1275 0362 507.193 134.1537 1349 ssr0102 tsl0811 122.6516 1605 232.3883 346.3784 DUF170ort254 7.8605 0 alr3885 gll1462 1278 0358 432.28 134.1375 1352 sll1489 tll1189 122.7243 1608 232.787 284.2156 putative circadian phase modifier CpmA homologort199 5.0733 0 alr0912 glr4422 1281 0355 506.199 136.2644 1355 slr0321 tlr0056 97.8579 1613 232.3381 321.3013   GTP-binding protein ERA homologort500 4.8821 1 alr1956 gll3611 1284 0351 420.3 135.1742 1358 slr2047 tll0037 93.8377 1616 232.3223 357.4455 phoH PhoH-like phosphate starvation-inducible proteinort353 3.0155 1 alr1952 gll2639 1286 0349 497.173 135.1738 1360 slr1531 tlr1973 93.8373 1618 232.3226 261.1775 ffh signal recognition particle protein (SRP54)ort092 10.8625 0 all2707 glr3481 1287 0348 493.84 135.2009 1361 sll0169 tlr0758 97.8639 1619 232.3144 226.1278 conserved hypothetical proteinort483 2.5462 1 alr2708 glr1529 1288 0347 493.68 135.1825 1362 slr1934 tlr1169 97.8597 1620 232.3145 226.1283 pdhA Pyruvate dehydrogenase E1 alpha subunitort051 4.4652 0 all1359 gll0551 1291 0344 458.2 133.1326 1365 sll0844 tlr2256 118.3701 1622 232.2942 351.4024   tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferaseort219 11.7647 0 all0859 glr1731 1292 0343 424.30 134.1371 1366 slr0819 tll1667 120.5101 1623 232.3328 354.4153 possible apolipoprotein n-acyltransferaseort196 3.6021 1 alr0577 glr0841 1293 0341 497.24 131.881 1367 slr1761 tlr1102 103.326 1625 232.4221 282.2138 FKBP-type peptidyl-prolyl cis-trans isomerase (PPIase)ort657 5.0849 1 alr4746 gll2556 1297 0336 505.106 124.107 1371 slr0546 tll0698 95.8485 1629 232.900 362.6707 trpC Indole-3-glycerol phosphate synthase:Proteins binding FMN and...ort431 3.4373 1 alr4745 glr3029 1298 0335 505.105 124.106 1372 slr1096 tll0868 95.8483 1630 232.901 300.2494 lpdA putative dihydrolipoamide dehydrogenaseort640 8.7114 0 alr0175 gll0554 1299 0334 407.13 124.105 1373 slr1673 tll1968 95.8481 1631 232.5289 185.829 spoU tRNA/rRNA methyltransferase (SpoU)ort543 4.5350 1 all0174 glr3125 1300 0332 407.15 121.9 1374 slr0017 tll0780 95.8458 1633 232.5288 185.821 murA UDP-N-glucosamine 1-carboxyvinyltransferaseort014 4.1781 1 alr1080 glr0547 1301 0331 498.138 126.240 1375 slr1022 tlr1328 122.6586 1634 232.2623 360.5161 argD putative N-acetylornithine aminotransferaseort356 8.7576 0 alr1026 glr1074 1302 0330 498.56 126.241 1376 sll1612 tlr0190 122.7876 1635 232.2579 355.4176 folC putuative bifunctional Dihydrofolate/Folylpolyglutamate synthaseort446 3.0337 1 alr1312 glr4015 1305 0323 497.129 131.850 1379 sll0996 tlr1315 116.2972 1640 232.1955 351.3979 miaB 2-methylthioadenine synthetaseort334 5.8052 0 all0980 gll0297 1306 0322 415.32 131.851 1380 slr1874 tll0523 120.5122 1641 232.4248 357.4357 ddlB D-alanine--D-alanine ligaseort365 2.8153 1 alr3858 gll0298 1309 0319 502.61 131.854 1383 sll1633 tll2382 117.3167 1644 232.815 356.4290 ftsZ cell division protein FtsZort479 5.8949 0 all1522 gll0762 1310 0318 505.93 131.856 1384 slr0526 tlr0279 122.6229 1645 232.2794 318.2924 panB putative Ketopantoate hydroxymethyltransferaseort404 7.3519 0 all4956 gll3085 1311 0317 477.85 131.868 1385 sll1917 tll1192 114.2234 1646 232.1185 361.5610 hemN possible oxygen independent coproporphyrinogen III oxidaseort094 5.3471 1 alr4957 glr3086 1312 0316 477.33 131.857 1386 slr1972 tll0008 114.2240 1647 232.1184 361.5635 conserved hypothetical proteinort307 2.7912 0 all4358 gll3767 1313 0315 463.67 129.594 1387 slr0164 tll1759 121.5914 1648 232.5162 360.4873 clpP4 ATP-dependent Clp protease proteolytic subunit 4ort306 2.3021 1 all4357 gll3766 1314 0314 463.68 129.595 1388 slr0165 tlr1071 121.5915 1649 232.5161 360.4874 clpP3 ATP-dependent Clp protease proteolytic subunit 3

Online Supporting Information

Page 45: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 13

ort416 2.6303 1 all2315 gll2657 1315 0313 375.2 133.1141 1389 sll1363 tll2254 117.3269 1650 232.4034 359.4730 ilvC ketol-acid reductoisomeraseort314 9.2178 0 alr2082 glr2448 1316 0312 388.7 133.1193 1390 slr1925 tlr2377 119.4179 1651 232.2041 279.2057 cobD putative cobalamin biosynthetic proteinort389 3.1462 1 all1985 glr1446 1165 0283 507.46 125.210 1268 slr0638 tll1034 120.4809 1671 232.3205 262.1794 glyQ glycyl-tRNA synthetase alpha subunitort231 7.5093 0 alr5252 gll0127 0431 0276 475.52 134.1420 0427 sll1653 tll2373 120.4913 1674 232.5275 359.4679 possible menaquinone biosynthesis methyltransferaseort408 3.5785 1 alr2895 gll4314 0430 0274 431.41 135.1890 0426 sll1893 tlr1185 111.1638 1675 232.4874 253.1658 hisF Imidazole glycerol phosphate synthase subunit HisF (cyclase)ort299 2.8794 1 all4480 glr1809 0428 0272 441.50 135.1922 0424 slr0056 tll1539 114.2264 1677 232.2245 362.6019 chlG chlorophyll synthase 33 kD subunitort424 2.1290 1 all2508 glr2763 0420 0257 464.47 136.2267 0419 slr0604 tlr0304 98.8681 1684 232.4326 157.582 lepA GTP-binding protein LepAort208 1.8307 1 asr1309 gsl2877 0418 0254 464.31 136.2287 0417 ssl2667 tsl1293 113.2079 1687 232.1952 360.5037 NifU-like proteinort050 7.7359 0 alr3210 gll3950 0415 0233 495.2 128.520 0413 slr2143 tll2327 121.6038 1689 232.3718 316.2875 putative L-cysteine/cystine lyaseort455 5.8213 0 all1663 gll3492 0413 0230 416.11 133.1169 0412 slr0528 tlr1239 120.4586 1690 232.5010 346.3769 murE UDP-N-acetylmuramyl-tripeptide synthetaseort095 4.0832 1 asr1611 gsl1913 0411 0228 483.85 133.1167 0410 ssl0331 tsr0564 122.6916 1693 232.3092 357.4321 conserved hypothetical proteinort621 2.7493 0 alr2737 gll3572 0410 0227 468.69 133.1312 0409 slr0469 tlr0147 116.2921 1694 232.3172 203.1009 rpsD 30S ribosomal protein S4ort228 8.3158 0 all3075 glr2715 1159 1181 493.24 136.2413 1257 sll0905 tlr1302 121.6055 1702 232.4708 289.2247 possible Maf-like protein implicated in septum formationort312 5.5048 0 alr2377 glr1772 1160 1183 506.211 136.2673 1259 slr0618 tll1716 120.4741 1704 232.4094 284.2173 cobB Cobyric acid synthase CobBort096 8.7226 0 asr2378 gsr1647 1161 1184 506.212 135.1904 1260 ssr1041 tsl1557 120.4743 1705 232.4095 284.2174 conserved hypothetical protein

ort555 7.9203 0 alr1526 glr2158 0551 1204 454.3 133.1285 0552 slr0012 tll1504 117.3212 1717 232.2790 310.2716cbbS, rbcS Ribulose bisphosphate carboxylase, small chain

ort553 1.8375 1 alr1524 glr2156 0550 1205 454.1 133.1284 0551 slr0009 tll1506 117.3210 1718 232.2792 310.2714cbbL, rbcL Ribulose bisphosphate carboxylase, large chain

ort097 4.1880 1 all3257 glr1519 0547 1208 464.26 136.2832 0548 slr0169 tlr0311 113.1896 1721 232.3757 195.926 conserved hypothetical proteinort303 8.4178 0 all5076 gll2369 0545 1215 431.26 133.1278 0546 slr0750 tll2345 122.6336 1723 232.1291 351.4018 chlN possible light-independent protochlorophyllide reductase subunitort297 8.7604 0 alr3441 glr0215 0544 1216 498.157 135.2069 0545 slr0772 tll2392 117.3130 1724 232.2368 199.967 chlB light-independent protochlorophyllide reductase ChlB subunitort302 4.5780 0 all5078 gll2370 0543 1217 431.23 133.1277 0544 slr0749 tll2347 122.6334 1725 232.1293 351.4016 chlL Protochlorophyllide reductase iron-sulfur

ort481 4.3930 1 all1743 glr2486 0542 1218 474.17 129.613 0543 slr0506 tlr0575 122.7943 1726 232.4185 318.2919pcr, por Light dependent protochlorophyllide oxido-reductase

ort060 7.1721 0 alr1961 glr3295 0539 1224 483.31 133.1209 0540 sll0809 tlr0228 122.7442 1731 232.3219 358.4539 Biotin/lipoate A/B protein ligase familyort659 7.9147 0 all5288 gll0754 0537 1226 468.14 136.2785 0538 sll0356 tll2272 120.4765 1733 232.1491 359.4710 trpF phosphoribosylanthranilate isomeraseort001 3.4536 1 alr5285 gll2864 0534 1229 468.73 135.1627 0534 sll0728 tll1311 119.4335 1736 232.1488 235.1404 accA acetyl-CoA carboxylase, alpha subunitort098 3.9756 1 alr5284 gll3145 0533 1230 468.72 135.1626 0533 sll0209 tll1312 122.6742 1737 232.1487 334.3385 conserved hypothetical proteinort099 3.6335 1 alr5283 gll3146 0532 1231 468.71 135.1625 0532 sll0208 tll1313 122.6741 1738 232.1486 361.5731 conserved hypothetical proteinort100 9.4567 0 all5255 gll3765 0528 1235 455.47 136.2749 0528 slr1573 tll2307 113.2045 1742 232.5272 184.818 conserved hypothetical proteinort415 3.4278 1 all4613 glr3279 0526 1239 468.52 136.2635 0526 slr2088 tlr1296 121.5292 1746 232.1693 361.5428 ilvB acetolactate synthaseort401 3.7613 1 alr3751 gll0839 0525 1240 397.2 136.2632 0525 slr0839 tlr2216 117.3282 1747 232.5423 291.2281 hemH Ferrochelataseort319 4.3761 1 all3802 glr0385 0523 1243 505.29 126.245 0523 slr0260 tlr0265 116.2714 1750 232.873 289.2248 cobO possible cob(I)alamin adenosyltransferaseort624 4.2273 1 alr1208 glr0288 0521 1245 450.14 136.2827 0521 sll0145 tll0259 117.3155 1753 232.4506 113.282 rrf ribosome recycling factor protein Rrfort643 3.8232 1 all2563 gll1597 0519 1248 424.11 131.907 0519 slr1793 tll0467 121.5841 1759 232.4377 346.3782 tal transaldolaseort362 7.5192 0 alr0718 glr2929 0518 1249 459.5 136.2266 0518 sll1833 tlr2074 115.2584 1760 232.3478 203.1021 ftsI putative peptidoglycan synthetase (pbp transpeptidase domain)ort230 11.9224 0 all3785 gll3175 0516 1251 506.47 136.2264 0516 slr0959 tlr1890 109.1219 1762 232.5395 335.3409 Possible membrane associated proteaseort313 5.8107 0 alr3338 gll0771 0515 1252 489.101 136.2818 0515 slr1124 tlr1532 109.1081 1763 232.2549 335.3401 cobC putative alpha-ribazole-5'-P phosphataseort258 9.0805 0 alr3339 gll3822 0514 1253 489.102 136.2819 0514 sll1018 tlr1533 109.1083 1764 232.2548 326.3134 putative dihydroorotaseort505 3.6335 1 all3570 glr4227 0511 1257 508.104 133.1224 0510 slr1622 tll1883 122.7810 1769 232.2448 288.2235 ppa inorganic pyrophosphataseort515 5.5852 0 alr5053 gll2525 0508 1259 418.28 135.1834 0508 sll1425 tlr0777 112.1826 1771 232.1272 357.4342 proS Prolyl-tRNA synthetaseort532 2.8405 1 alr4784 glr3280 0506 1261 487.84 125.177 0506 sll1823 tll0531 121.5788 1773 232.1024 357.4351 purA Adenylosuccinate synthetaseort012 3.6693 1 alr1245 gll3931 0499 1268 506.18 133.1164 0499 slr1898 tll1408 101.151 1780 232.4435 362.6710 argB acetylglutamate kinase

Online Supporting Information

Page 46: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 14

ort511 6.2271 0 all4248 gll0025 0497 1271 509.85 127.380 0497 sll0270 tll0415 104.436 1782 232.5055 321.3016 priA primosomal protein N' (replication factor Y)ort636 2.8669 1 all5263 glr2572 0496 1272 483.42 127.325 0495 slr0653 tll0617 97.8595 1783 232.5263 361.5683 sigA Putative principal RNA polymerase sigma factorort399 3.9627 1 alr1878 glr3212 0495 1273 402.15 126.250 0494 slr1887 tll1646 122.6982 1785 232.3629 295.2369 hemC Porphobilinogen deaminaseort052 9.8892 0 asr4549 gsl1645 0491 1283 373.14 135.2158 0490 ssl2296 tsl1825 91.8284 1794 232.1464 362.6672 4a-hydroxytetrahydrobiopterin dehydratase (PCD)ort101 3.3443 1 alr1093 glr3817 0490 1284 452.27 126.244 0489 slr0609 tlr2276 122.7669 1795 232.2611 361.5580 conserved hypothetical proteinort102 7.6973 0 all0200 gll0542 0487 1291 472.89 131.839 0486 slr1411 tll2296 118.3643 1802 232.1631 362.6650 conserved hypothetical proteinort194 6.4252 0 all5306 glr0140 0484 1294 474.61 136.2536 0483 sll1854 tlr1785 94.8387 1807 232.1511 262.1783   exodeoxyribonuclease IIIort403 1.9856 1 alr3265 glr0071 0483 1296 505.173 127.376 0482 sll0017 tlr0479 121.5195 1809 232.3765 307.2647 hemL glutamate-1-semialdehyde 2,1-aminomutaseort103 3.3323 1 all4664 gll2901 0480 1301 464.2 132.1041 0478 slr1926 tlr0863 121.6188 1818 232.970 162.628 conserved hypothetical proteinort104 5.8508 0 all4665 glr0255 0479 1302 435.48 132.1040 0477 slr0039 tlr2026 121.6187 1819 232.969 361.5799 conserved hypothetical proteinort441 3.0569 1 all1019 glr0412 0477 1305 498.108 125.149 0475 sll0555 tll0424 121.5861 1822 232.2573 358.4501 map putative methionine aminopeptidaseort583 2.4088 1 alr5297 glr0828 0475 1306 354.22 129.556 0473 sll1740 tlr1303 121.5555 1824 232.1502 289.2249 rpl19 possible 50S ribosomal protein L19ort387 5.2403 0 all3205 gll2378 0473 1308 345.1 131.864 0471 sll0179 tll0506 121.5357 1826 232.3713 361.5986 gltX glutamyl-tRNA synthetaseort289 8.4611 0 all4374 glr1622 0466 1319 482.129 134.1450 0464 slr0880 tll2058 121.5283 1838 232.2209 360.4952   similar to secreted protein MPB70 precursorort250 4.4184 1 all2456 gll1912 0464 1320 506.80 135.2188 0462 sll0194 tlr0953 115.2553 1839 232.4275 357.4361 protein secretion component, Tat familyort488 3.0326 0 all2453 glr3038 0462 1322 506.86 124.82 0460 sll1316 tlr0959 115.2492 1841 232.4273 201.991 petC Cytochrome b6f complex subunit (Rieske iron-sulfur protein)ort486 3.9123 0 all2452 glr3039 0461 1323 506.87 124.83 0459 sll1317 tlr0960 115.2493 1842 232.4272 319.2936 petA apocytochrome fort680 5.2465 0 all4699 gll4393 0460 1324 452.58 124.84 0458 sll1187 tll2157 122.6320 1843 232.939 360.5189 yvoC putative prolipoprotein diacylglyceryl transferaseort317 5.2407 0 all4698 gll1377 0459 1325 452.59 124.85 0457 slr0239 tlr0563 122.6321 1844 232.940 327.3183 cobM PUTATIVE PRECORRIN-4 C11-METHYLTRANSFERASEort508 5.6955 0 all3552 gll0510 0457 1327 482.64 135.1837 0456 sll1546 tll1680 113.1977 1846 232.2430 272.1948 ppx putative exopolyphosphataseort670 8.3222 0 alr3553 glr3428 0456 1328 482.59 135.1999 0455 slr0926 tlr0402 113.1993 1847 232.2431 361.5393 ubiA probable 4-hydroxybenzoate-octaprenyltransferaseort421 7.0685 0 all5167 glr2791 0454 1330 498.69 127.311 0453 slr0951 tlr0605 118.3942 1849 232.1369 289.2244 ispD putative 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferaseort350 3.6287 1 alr1894 glr3506 0453 1333 362.19 127.402 0452 slr0886 tlr1502 121.6008 1852 232.2657 292.2303 fabG 3-oxoacyl-[acyl-carrier protein] reductaseort277 4.6512 1 alr2373 glr2168 0450 1337 506.208 130.661 0446 sll0760 tll1324 108.944 1857 232.4090 362.6704 putative multidrug efflux ABC transporterort275 4.8385 0 alr2372 glr2167 0449 1338 506.206 130.660 0445 sll0759 tlr0010 120.5080 1858 232.4089 362.6673 putative multidrug efflux ABC transporterort325 8.3212 0 all0948 glr2166 0448 1339 493.120 136.2540 0444 sll1899 tll1893 115.2523 1859 232.4409 362.6661 ctaB putative protoheme IX farnesyltransferaseort105 9.5613 0 all0949 glr2165 0447 1340 493.119 136.2539 0443 sll1898 tll1894 115.2522 1860 232.4410 362.6660 conserved hypothetical proteinort563 6.6652 1 all5258 glr3604 0442 1346 455.45 135.2166 0438 sll0300 tlr1686 122.6503 1865 232.5269 306.2643 ribE Putative Riboflavin synthase alpha chainort279 7.2745 0 alr5260 gll3752 0441 1348 455.15 134.1436 0437 slr1520 tlr2042 122.7629 1867 232.5267 306.2627 putative oxidoreductase, aldo/keto reductase familyort106 8.5202 0 alr0297 glr4139 0439 1350 484.95 131.927 0435 sll1500 tll2306 122.6319 1869 232.1981 360.5199 conserved hypothetical proteinort653 4.6648 0 alr2780 glr3198 0436 1353 482.133 133.1274 0432 slr2058 tll1881 122.7660 1872 232.3932 358.4636 topA Prokaryotic DNA topoisomeraseort467 3.4549 1 all4883 glr3120 0435 1354 456.29 133.1206 0431 sll0223 tll0045 93.8370 1873 232.1122 362.6309 ndhB NADH dehydrogenase I chain 2 (or N)ort040 10.9163 0 alr5256 gll1338 0433 1359 455.12 136.2348 0429 sll1655 tll1111 113.1937 1876 232.5271 361.5316 birA putative Biotin--acetyl-CoA-carboxylase ligaseort668 6.4108 0 all4012 glr2705 1181 1366 412.23 135.2022 1297 sll0109 tll1803 105.523 1885 232.673 268.1885 tyrA Chorismate mutaseort107 4.4849 0 alr3414 glr1476 1185 1370 507.95 135.1642 1301 slr1470 tll0601 122.7227 1889 232.2337 359.4795 conserved hypothetical proteinort271 4.0526 0 alr3415 glr2231 1186 1371 507.96 135.1643 1302 slr1471 tll0600 122.7228 1890 232.2338 359.4796 Putative inner membrane proteinort108 4.9635 0 all4296 glr1649 1187 1373 507.100 135.1646 1303 slr0480 tll0076 122.7231 1892 232.5105 252.1632 conserved hypothetical proteinort635 4.2058 1 all3976 glr0716 1188 1374 357.10 136.2288 1304 slr1703 tlr1552 106.673 1894 232.705 257.1711 serRS putative seryl-tRNA synthetaseort610 2.4249 1 all3969 gsl3914 1190 1376 357.13 136.2291 1306 slr0628 tll1826 121.5779 1896 232.713 263.1798 rps14 30S ribosomal protein S14ort503 3.1232 1 all4396 gll3149 1191 1377 375.11 125.155 1307 sll1043 tll2193 122.6329 1897 232.2185 358.4563 pnp polyribonucleotide nucleotidyltransferaseort284 5.9013 0 all4680 gll3021 1193 1379 422.12 136.2657 1309 sll0818 tlr1514 122.7874 1899 232.959 204.1029 putative tetrapyrrole methylase family proteinort198 4.5751 1 alr3739 glr4375 0218 1544 398.35 126.261 0246 slr1090 tll2353 120.5042 1928 232.5435 171.702 GTP1/Obg family GTP-binding proteinort459 6.6462 0 all0049 glr3637 0217 1546 506.160 127.354 0245 sll1772 tlr2472 121.5587 1931 232.1592 331.3287 mutS putative DNA mismatch repair protein MutS family

Online Supporting Information

Page 47: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 15

ort023 5.8973 0 alr2782 glr2442 0387 0195 373.1 126.288 0384 sll1112 tlr0494 122.6607 1938 232.3934 361.5527 aroD Dehydroquinase class IIort520 3.2338 1 asr4319 gsl3408 0329 0187 452.35 128.451 0371 ssr2831 tsl1567 121.6202 1960 232.5127 118.312 psaE photosystem I subunit IV (PsaE)ort360 5.8556 0 alr4320 gll3407 0328 0186 452.37 128.452 0370 slr1689 tll1566 121.6203 1961 232.5128 118.313 fpg Formamidopyrimidine-DNA glycolase (FAPY-DNA glycolase)ort489 1.7277 1 alr3422 gll1918 0326 1648 509.270 131.827 0368 slr0343 tlr0797 101.117 1966 232.2345 356.4240 petD cytochrome b6f complex subunit 4 (17 kd polypeptide)ort487 1.1236 1 alr3421 gll1919 0325 1649 509.269 131.826 0367 slr0342 tlr0796 101.116 1967 232.2344 356.4239 petB apocytochrome b6ort447 ###### 0 alr3455 glr1992 0322 1652 504.122 135.1975 0364 sll0288 tlr2016 113.2066 1970 232.2379 155.566 minC possible septum site-determining proteinort448 3.9291 1 alr3456 glr1993 0321 1653 504.123 126.213 0363 sll0289 tlr2017 113.2067 1971 232.2380 155.567 minD putative septum site-determining protein MinDort449 4.1129 1 asr3457 gsr1994 0320 1654 504.124 126.214 0362 ssl0546 tlr2018 113.2068 1972 232.2381 155.568 minE possible septum site-determining protein MinEort109 9.1011 0 alr0116 glr3274 0319 1659 370.3 135.1790 0360 sll1866 tll2123 116.2865 1975 232.5335 247.1572   conserved hypothetical proteinort402 9.2934 0 alr0115 glr1755 0318 1660 370.2 135.1789 0359 sll1237 tlr1836 116.2862 1976 232.5334 358.4534 hemK possible protoporphyrinogen oxidaseort521 2.1642 1 all0138 glr2999 0315 1665 494.8 127.302 0354 slr0906 tlr1530 122.6941 1982 232.5357 77.123 psbB photosystem II chlorophyll-binding protein CP47ort615 1.6101 1 all0136 glr0987 0312 1668 494.9 127.306 0351 slr1356 tll2046 122.6943 1985 232.5355 335.3414 rps1a 30S ribosomal protein S1, homolog Aort444 2.6998 1 alr4124 gll2577 0311 1670 474.106 129.637 0349 sll0927 tll0978 122.6945 1987 232.4655 361.5852 metK S-adenosylmethionine synthetaseort252 7.5240 0 all4656 gll0781 0310 1671 464.9 129.638 0348 slr1420 tll1291 119.4142 1988 232.977 255.1690 Putative carbohydrate kinase, FGGY familyort274 6.7345 0 alr5068 gll3174 1591 1688 418.37 136.2278 1747 slr0328 tlr1810 120.4679 2026 232.1284 231.1345 putative low molecular weight protein-tyrosine-phosphataseort480 6.4817 0 alr2936 gll1806 1590 1689 461.69 134.1385 1746 sll1249 tll2450 121.6145 2027 232.4831 362.6150 panC putative bifunctional enzyme; pantothenate synthetase/cytidylaten kinaseort540 11.4171 0 alr3525 gll0720 1589 1691 461.72 132.1028 1745 slr0838 tlr1811 121.5252 2029 232.2123 312.2767 purM phosphoribosyl formylglycinamidine cyclo-ligaseort568 5.2948 0 all3791 gll3123 1584 1703 505.114 127.397 1740 sll0320 tlr1083 116.2883 2036 232.889 360.5036 rnd putative ribonuclease Dort110 9.6354 0 alr3980 glr3763 1583 1704 357.7 127.313 1739 slr0589 tll0625 120.4770 2038 232.702 321.3009 conserved hypothetical proteinort206 4.7836 1 alr0652 glr1480 1581 1707 302.2 134.1416 1736 slr0067 tlr2419 101.90 2041 232.3443 205.1043 MRP protein homologort111 8.9527 0 alr2572 glr1586 1579 1709 421.30 133.1150 1734 slr1285 tlr1565 119.4342 2043 232.4385 358.4523 conserved hypothetical proteinort519 4.1518 1 all0329 glr3701 1578 1710 485.8 134.1500 1733 slr0737 tll1724 117.3514 2044 232.3578 344.3693 psaD photosystem I reaction center subunit II (PsaD)ort658 4.6495 0 all0328 glr1717 1577 1711 485.9 134.1501 1732 slr0738 tll1672 117.3515 2045 232.3577 344.3694 trpE Anthranilate synthase component I and chorismate binding enzymeort112 6.3363 0 alr3351 glr2275 1576 1712 346.3 131.939 1731 slr0990 tll0322 121.5582 2046 232.2104 315.2858 conserved hypothetical proteinort506 4.1691 1 all4861 gll0414 1575 1713 458.16 131.940 1730 sll0920 tll1912 116.3099 2047 232.1104 297.2431 ppc phosphoenolpyruvate carboxylaseort557 6.4191 0 all3374 gll2405 1573 1715 450.37 131.942 1728 sll1277 tll2444 111.1601 2049 232.2294 269.1897 recF putative DNA repair and genetic recombination protein RecFort113 5.9256 0 alr1286 gll2047 1571 1717 456.38 133.1260 1726 slr1717 tll1241 122.7215 2051 232.1930 326.3155 conserved hypothetical proteinort450 2.7706 1 all2906 gll3412 1569 1719 391.20 136.2440 1724 sll1536 tlr2403 108.980 2053 232.4861 151.536 moeB molybdopterin biosynthesis proteinort114 7.4787 0 alr5126 gll3173 1567 1722 386.17 135.1986 1721 slr1563 tlr1821 99.8746 2056 232.1299 332.3320 conserved hypothetical proteinort212 6.2936 0 all5123 gll2874 1566 1723 406.9 126.252 1720 slr1293 tll0232 117.3387 2057 232.1302 362.6111 phytoene dehydrogenase related enzymeort241 9.1785 0 all1141 glr0406 1565 1724 389.1 127.383 1719 slr2081 tll2435 122.6634 2058 232.3264 357.4419 Prephenate dehydrogenaseort115 9.9210 0 alr4075 glr1449 1564 1725 507.14 127.317 1718 slr1495 tlr1240 122.7496 2059 232.2057 225.1269 conserved hypothetical proteinort116 7.5851 0 asr4076 gsl3135 1563 1726 507.16 127.318 1717 ssr0332 tsl2097 122.7498 2060 232.2056 361.5834 conserved hypothetical proteinort556 1.8121 1 all3272 gll3416 1562 1727 324.3 136.2347 1716 sll0569 tlr2089 122.6631 2062 232.3769 76.118 recA RecA bacterial DNA recombination proteinort117 11.0019 0 alr3863 glr2043 1561 1728 372.20 127.367 1715 sll0295 tlr1691 122.6543 2063 232.811 360.5108 conserved hypothetical proteinort272 7.5610 0 alr2308 gll3769 1560 1730 481.68 127.363 1714 sll0031 tlr0739 99.8758 2065 232.4026 339.3546 putative ldpA proteinort118 5.7911 0 alr4216 gll3770 1559 1731 374.3 130.685 1713 sll1262 tlr1130 99.8757 2066 232.4566 362.6376 conserved hypothetical proteinort593 3.4338 1 all4215 glr0085 1558 1732 374.12 130.702 1712 sll1799 tlr0081 99.8766 2067 232.4567 298.2439 rpl3 50S ribosomal protein L3ort597 4.2138 1 all4214 glr0086 1557 1733 374.13 130.703 1711 sll1800 tlr0082 99.8767 2068 232.4568 298.2440 rpl4 50S ribosomal protein L4ort588 4.8482 1 all4213 glr0087 1556 1734 374.14 130.704 1710 sll1801 tlr0083 99.8768 2069 232.4569 298.2441 rpl23 50S ribosomal protein L23ort584 2.8292 1 all4212 glr0903 1555 1735 374.15 130.705 1709 sll1802 tlr0084 99.8769 2070 232.4570 298.2442 rpl2 50S ribosomal protein L2ort614 1.6236 1 asl4211 gsr0904 1554 1736 374.16 130.706 1708 ssl3432 tsr0085 99.8770 2071 232.4571 298.2443 rps19 30S ribosomal protein S19ort587 2.7470 1 all4210 gll3922 1553 1737 374.17 130.707 1707 sll1803 tlr0086 99.8771 2072 232.4572 298.2444 rpl22 50S ribosomal protein L22

Online Supporting Information

Page 48: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 16

ort617 2.3864 1 all4209 gll3921 1552 1738 374.18 130.708 1706 sll1804 tlr0087 99.8772 2073 232.4573 298.2445 rps3 30S ribosomal protein S3ort580 1.7309 1 all4208 gll3920 1551 1739 374.19 130.709 1705 sll1805 tlr0088 99.8773 2074 232.4574 298.2446 rpl16 50S ribosomal protein L16ort592 5.4025 0 asl4207 gsl3919 1550 1740 374.20 130.710 1704 ssl3436 tsr0089 99.8774 2075 232.4575 298.2447 rpl29 50S ribosomal protein L29ort612 3.2755 1 asl4206 gsl3918 1549 1741 374.21 130.711 1703 ssl3437 tsr0090 99.8775 2076 232.4576 298.2448 rps17 30S ribosomal protein S17ort601 2.2454 1 all4205 gll3917 1548 1742 374.22 130.712 1702 sll1806 tlr0091 99.8776 2077 232.4577 298.2449 rplN 50S ribosomal protein L14ort589 3.6335 1 asl4204 gll3916 1547 1743 374.23 130.713 1701 sll1807 tlr0092 99.8777 2078 232.4578 296.2389 rpl24 50S ribosomal protein L24ort598 2.0908 1 all4203 gll3915 1546 1744 374.24 130.714 1700 sll1808 tlr0093 99.8778 2079 232.4579 296.2390 rpl5 50S ribosomal protein L5ort620 3.2022 1 all4202 gll3913 1545 1745 374.25 130.715 1699 sll1809 tlr0094 99.8779 2080 232.4580 296.2391 rps8 30S ribosomal protein S8ort599 4.2082 1 all4201 gll3912 1544 1746 374.27 130.716 1698 sll1810 tlr0095 99.8780 2081 232.4581 296.2392 rpl6 50S ribosomal protein L6ort582 4.0830 1 all4200 gll3911 1543 1747 374.28 130.717 1697 sll1811 tlr0096 99.8781 2082 232.4582 296.2393 rpl18 50S ribosomal protein L18ort622 2.3137 1 all4199 gll3910 1542 1748 374.29 130.718 1696 sll1812 tlr0097 99.8782 2083 232.4583 296.2394 rpsE 30S ribosomal protein S5ort579 4.0651 1 all4198 gll3909 1541 1749 374.30 130.719 1695 sll1813 tlr0098 99.8783 2084 232.4584 296.2395 rpl15 50S ribosomal protein L15ort633 3.3749 1 all4197 gll1393 1540 1750 374.31 130.720 1694 sll1814 tlr0099 99.8784 2085 232.4585 296.2396 secY preprotein translocase SecY subunitort007 6.9091 0 all4196 gll1392 1539 1751 374.32 130.721 1693 sll1815 tlr0100 99.8785 2086 232.4586 296.2397 adk Adenylate kinaseort609 2.3989 1 all4193 gll3574 1537 1753 374.35 130.723 1691 sll1816 tlr0103 99.8789 2088 232.4589 296.2399 rps13 30S ribosomal protein S13ort607 1.2622 1 all4192 gll3573 1536 1754 374.36 130.724 1690 sll1817 tlr0104 99.8790 2089 232.4590 296.2400 rps11 30S ribosomal protein S11ort602 3.7192 1 all4191 gll3571 1535 1755 374.37 130.725 1689 sll1818 tlr0105 99.8791 2090 232.4591 296.2401 rpoA RNA polymerase alpha subunitort581 2.1533 1 all4190 gll3570 1534 1756 374.38 130.726 1688 sll1819 tlr0106 99.8792 2091 232.4592 296.2402 rpl17 50S ribosomal protein L17ort661 6.1436 0 all4189 gll3569 1533 1757 374.39 130.727 1687 sll1820 tlr0107 99.8793 2092 232.4593 296.2403 truA tRNA pseudouridine synthase Aort578 2.9475 1 all4188 glr4418 1532 1758 374.40 130.728 1686 sll1821 tlr0108 99.8794 2093 232.4594 296.2404 rpl13 50S ribosomal protein L13ort202 2.8099 1 all2457 glr3730 1528 1762 401.28 130.684 1682 sll1193 tlr0496 122.6287 2097 232.4276 360.5171 HNH endonuclease family proteinort011 6.0994 0 alr2458 glr3818 1527 1763 401.6 130.732 1681 slr0827 tll0460 122.7861 2098 232.4277 357.4397 alr putative alanine racemaseort517 1.3910 1 alr5155 glr3439 1523 1769 456.8 135.1944 1673 slr1835 tlr0732 122.6939 2123 232.1362 221.1223 psaB photosystem I P700 chlorophyll a apoprotein subunit Ib (PsaB)ort516 2.2075 1 alr5154 glr3438 1524 1770 456.7 135.1943 1672 slr1834 tlr0731 122.6938 2124 232.1361 221.1222 psaA photosystem I P700 chlorophyll a apoprotein subunit Ia (PsaA)ort047 9.9441 0 all0453 gll2546 1525 1772 502.74 135.1783 1671 slr0969 tll1307 117.3403 2126 232.1795 307.2665 cbiG bifunctional cbiH protien and precorrin-3B C17-methyltransferaseort429 3.3616 1 all1694 gll1029 1514 1775 463.15 136.2846 1670 slr1598 tlr0613 110.1280 2130 232.5025 216.1162 lipA lipoic acid synthetaseort385 3.8422 1 alr4344 glr1508 1512 1777 423.41 132.1054 1668 sll1499 tll1368 122.6439 2132 232.5151 361.5417 glsF Ferredoxin-dependent glutamate synthase, Fd-GOGATort608 0.9678 1 all4340 glr3925 1511 1779 423.16 132.1058 1667 sll1096 tlr1747 122.7700 2135 232.5147 359.4692 rps12 30S ribosomal protein S12ort619 2.8512 1 all4339 glr3926 1510 1780 423.17 132.1060 1666 sll1097 tlr1748 122.7701 2136 232.5146 359.4693 rps7 30S ribosomal protein S7ort368 1.9792 1 all4338 glr3927 1509 1781 423.18 132.1061 1665 sll1098 tlr1749 122.7702 2137 232.5145 359.4694 fusA elongation factor EF-Gort665 1.7116 1 all4337 glr3928 1508 1782 423.19 132.1062 1664 sll1099 tlr1750 122.7703 2138 232.5144 359.4696 tufA elongation factor EF-Tuort606 0.8543 1 all4336 glr3929 1507 1783 423.20 132.1063 1663 sll1101 tlr1751 122.7704 2139 232.5143 359.4697 rps10 30S ribosomal protein S10ort059 4.7356 1 all4335 glr3968 1506 1784 423.21 132.1064 1662 sll0195 tlr2461 122.7705 2140 232.5142 311.2738 ATP-dependent protease La (LON) domainort497 6.1800 1 alr4334 gll0524 1504 1786 423.36 132.1053 1660 sll1662 tlr2106 122.7706 2142 232.5141 311.2758 pheA Chorismate mutase-Prephenate dehydrataseort570 7.7365 0 alr4332 glr1507 1502 1788 423.35 132.1052 1658 slr1130 tll0663 122.6427 2144 232.5139 362.6178 rnhB ribonuclease HIIort569 3.1798 1 alr4331 glr1506 1501 1789 423.34 132.1051 1657 slr1129 tlr0597 122.6426 2145 232.5138 355.4186 rne ribonuclease Eort119 3.9344 0 asl4325 gsl0746 1499 1792 443.38 127.309 1654 ssr2723 tll0211 98.8736 2148 232.5132 299.2462   conserved hypothetical proteinort328 9.5744 0 all4464 glr1656 0081 1624 493.74 129.611 0095 slr1791 tll1035 121.5284 2164 232.2234 348.3852 cysH Phosphoadenosine phosphosulfate reductaseort278 7.9381 0 alr4094 glr3503 0082 1625 507.59 129.541 0096 slr1743 tlr1136 119.4435 2165 232.4685 362.6161 putative NADH dehydrogenase, transport associatedort237 7.8222 0 all1802 glr0515 0084 1628 489.87 134.1554 0098 slr1509 tll0303 120.4792 2168 232.3659 353.4116 possible sodium transporter, Trk familyort281 7.6511 1 all1800 glr0516 0085 1629 489.88 134.1555 0099 sll0493 tlr0412 120.4794 2169 232.3658 353.4117 putative potassium channel, VIC familyort222 5.1723 0 all1211 glr2775 0098 0181 475.2 135.1840 0117 slr2030 tll1647 121.5510 2188 232.4509 349.3902 possible Fe-S oxidoreductaseort120 9.7074 0 alr1215 glr3755 0099 0179 441.1 133.1240 0118 slr1990 tll0200 120.4512 2190 232.4511 310.2720 conserved hypothetical protein

Online Supporting Information

Page 49: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 17

ort462 6.8213 0 alr1217 glr0846 0100 0177 441.4 135.2046 0119 sll0631 tll2408 105.589 2192 232.4513 304.2580 nadB putative L-aspartate oxidaseort121 8.4493 0 all1220 glr2112 0101 0176 441.58 135.1918 0120 slr0565 tlr0588 75.7991 2193 232.4516 344.3709 conserved hypothetical proteinort223 3.5794 1 alr1222 gll4317 0102 0175 441.8 129.608 0121 slr0082 tlr1564 121.5424 2194 232.4518 362.6256   possible Fe-S oxidoreductaseort122 6.3493 0 all4101 gll0853 0103 0172 486.93 124.116 0123 slr0651 tlr0351 122.7777 2197 232.4680 362.6430 conserved hypothetical proteinort123 5.8153 0 all4102 gll4038 0104 0170 486.92 124.115 0125 slr0656 tll1862 122.7036 2198 232.4679 360.5112 conserved hypothetical proteinort025 6.5786 0 alr1244 glr2710 0107 0163 506.16 126.297 0128 sll1669 tlr0882 122.6361 2202 232.4434 356.4296 aroK Shikimate kinaseort236 5.3009 0 alr0063 glr4032 0112 1963 472.17 136.2342 0132 sll0754 tll1629 104.471 2207 232.1604 346.3787   possible Ribosome-binding factor Aort125 5.2226 0 all2381 gll0842 0114 1967 506.63 133.1153 0135 slr0941 tll0336 120.4901 2212 232.4097 361.5405 conserved hypothetical proteinort126 4.0985 1 alr2385 gll4382 0116 1969 506.215 126.222 0137 slr1417 tll0867 120.4747 2214 232.4100 359.4851 conserved hypothetical proteinort127 7.1284 0 alr2386 glr3390 0117 1970 506.216 126.223 0138 sll1979 tsl0866 120.4748 2215 232.4101 359.4852 conserved hypothetical proteinort128 4.8971 1 asr4321 gsl3544 0121 1975 452.38 129.534 0141 ssl1690 tsl0017 121.6204 2219 232.5129 361.5884 conserved hypothetical proteinort571 2.0237 1 all0129 gll4162 0128 1988 504.130 136.2478 0150 slr0115 tlr2423 122.6827 2236 232.5348 362.6453 rpaA two-component response regulatorort412 6.2940 0 all4705 gll1432 0129 1989 428.12 136.2613 0151 slr0446 tll0518 122.7849 2237 232.935 342.3628 holB DNA polymerase III, delta prime subunitort291 9.3496 0 all4708 glr0678 0130 1990 399.9 136.2612 0152 slr0379 tlr2055 122.7848 2238 232.934 231.1340 Thymidylate kinaseort191 4.3036 1 all3193 glr0715 0132 1992 387.10 131.820 0154 slr0823 tll1525 122.7024 2240 232.2777 253.1646 ycf3 cyanobacterial conserved hypotheticalort552 3.9926 0 alr3823 glr2406 0133 1993 479.20 133.1188 0155 slr0448 tlr1103 119.4331 2245 232.852 361.5486 radA putative DNA repair protein RadAort572 1.1578 1 all3822 glr2669 0134 1994 479.118 133.1298 0156 slr0947 tll2364 119.4162 2246 232.853 361.5773 rpaB two-component response regulatorort502 4.3056 1 alr0238 gll0800 0135 1995 506.153 133.1299 0157 slr1510 tlr0844 115.2555 2247 232.1666 358.4622 plsX fatty acid/phospholipid synthesis protein PlsXort501 5.9015 0 alr0241 gll3009 0138 1998 506.157 133.1302 0160 sll1848 tlr0042 115.2561 2250 232.1669 360.5035 plsC putative 1-acyl-sn-glycerol-3-phosphate acyltransferaseort676 6.5207 0 asr3137 gsl4394 0140 2000 475.30 133.1185 0163 ssr1425 tsl2457 108.1011 2253 232.2723 238.1449 ycf34 conserved hypothetical proteinort213 9.7292 0 all3136 gll2924 0141 2001 475.75 132.1104 0164 sll0825 tlr0715 108.928 2254 232.2722 238.1438   Poly A polymerase family

ort323 4.9792 0 alr1833 glr1744 0143 2003 474.11 135.1845 0166 slr1255 tll1560 120.4690 2256 232.3649 197.948crtB, pys phytoene synthases

ort129 4.1746 1 all1732 glr1522 0145 2005 434.47 135.1987 0168 slr1623 tll0447 104.483 2258 232.4194 288.2236 conserved hypothetical proteinort554 2.8047 1 all3953 glr1340 0147 2007 429.39 135.1843 0170 sll0998 tlr1206 119.4263 2260 232.729 294.2363 rbcR putative Rubisco transcriptional regulatorort471 4.0972 1 alr3956 glr0218 0149 2009 429.15 135.1990 0172 slr0844 tll0720 118.3907 2262 232.726 354.4160 ndhF NADH dehydrogenase I chain 5 (or L)ort469 2.5841 1 alr3957 glr0219 0150 2010 429.16 135.1991 0173 slr0331 tll0719 118.3909 2263 232.725 349.3920 ndhD NADH dehydrogenase I chain 4 (or M)ort130 6.5891 0 alr3399 glr1857 0151 2011 506.25 135.1994 0174 slr1530 tll1658 101.110 2265 232.2324 272.1946 conserved hypothetical proteinort131 4.9200 0 all0783 glr0789 0153 2013 492.56 128.445 0176 slr2141 tll1771 122.6918 2267 232.1916 361.5476 conserved hypothetical proteinort238 3.5926 1 asl2551 gsl4037 0154 2014 454.36 131.930 0177 ssl0564 tsl1865 121.5189 2268 232.4368 195.935   possible transcriptional regulatorort470 1.9638 1 alr0226 gll0652 0157 2017 401.40 128.471 0181 sll0522 tlr0670 122.6296 2271 232.1655 362.6240 ndhE NADH dehydrogenase I chain 4L or Kort472 4.3225 1 alr0225 gll0653 0158 2018 401.39 128.470 0182 sll0521 tlr0669 122.6295 2272 232.1654 362.6239 ndhG NADH dehydrogenase I chain 6 (or J)ort473 2.6506 1 alr0224 gll0654 0159 2019 401.38 128.469 0183 sll0520 tlr0668 122.6294 2273 232.1653 362.6238 ndhI NADH dehydrogenase I chain I (or NdhI)ort466 2.5004 1 alr0223 gll1584 0160 2020 401.36 128.468 0184 sll0519 tlr0667 122.6291 2274 232.1652 362.6237 ndhA NADH dehydrogenase I chain 1 (or H)ort386 4.9342 0 alr0222 glr3012 0161 2021 401.35 127.342 0185 sll0401 tlr2393 122.6290 2275 232.1651 249.1600 gltA citrate synthaseort656 2.3165 1 all3794 glr2758 0164 2027 505.109 130.657 0188 slr0543 tll2475 118.3575 2280 232.886 314.2817 trpB Tryptophan synthase, beta chain:Pyridoxal-5'-phosphate-depend...ort536 5.4007 1 alr0989 gll1251 0167 2038 372.14 136.2802 0191 sll0901 tll0586 121.5847 2285 232.1879 287.2211 purE Phosphoribosylaminoimidazole carboxylaseort229 5.5545 1 alr3201 glr4402 0168 2040 507.107 136.2296 0193 slr0525 tll0451 122.7900 2287 232.2784 318.2923   Possible magnesium protoprophyrin IX methyltransferaseort293 4.0812 1 all1736 glr1588 0169 2043 434.46 135.2057 0194 slr1783 tlr1263 121.5862 2289 232.4192 283.2148 two-component response regulatorort207 6.2389 0 alr2505 gll4384 0170 2045 461.80 129.562 0195 slr0387 tlr0115 120.4669 2291 232.4323 253.1650 NifS-like aminotransferase class-Vort132 7.1534 0 all3354 glr2927 0171 2048 498.130 134.1484 0196 sll1144 tlr2072 120.5159 2298 232.2107 187.843 conserved hypothetical proteinort442 7.5926 0 alr0033 gll1578 0176 2059 382.6 133.1232 0201 slr1518 tlr2196 120.4641 2307 232.1574 360.4882 menA 1,4-dihydroxy-2-naphthoate (DHNA) octaprenyltransferaseort393 7.2132 1 all3859 gll3682 0178 2061 502.154 131.823 0203 slr1238 tlr0908 120.4545 2309 232.814 360.5226 gshB putative glutathione synthetase

Online Supporting Information

Page 50: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 18

ort509 2.6452 1 all0273 glr1233 0180 2063 480.45 136.2313 0205 sll1865 tll0527 109.1139 2311 232.1720 362.6601 prfB peptide chain release factor RF-2ort133 7.6884 0 all0271 gll1243 0182 2065 480.47 136.2315 0207 slr0053 tlr1585 118.3814 2313 232.1718 362.6503 conserved hypothetical proteinort337 6.7481 0 all0270 gll1242 0183 2066 480.48 136.2316 0208 slr0054 tlr1586 118.3815 2314 232.1717 362.6504 dgkA possible diacylglycerol kinaseort478 4.3700 1 all0269 glr0883 0184 2067 480.49 136.2317 0209 slr0055 tlr1471 118.3817 2315 232.1716 362.6505 pabA para-aminobenzoate synthase component IIort134 6.0225 0 all0268 glr0884 0185 2068 480.50 136.2318 0210 slr2005 tlr1472 113.2053 2316 232.1715 362.6506 conserved hypothetical proteinort019 9.8344 0 all3717 glr4279 0187 2070 507.173 135.2189 0212 sll0502 tll0826 120.4920 2319 232.2460 360.5034 argS arginyl-tRNA synthetaseort463 8.5661 0 all2091 gll0888 0188 2073 372.1 126.243 0213 slr0936 tll1713 117.3252 2320 232.2050 267.1863 nadC Nicotinate-nucleotide pyrophosphorylase:Quinolinate phosphoriobsyltransferaseort285 5.9917 0 all4677 glr2452 0189 2074 501.94 135.2184 0214 sll1615 tlr0961 107.835 2322 232.960 351.4010 putative thiophen / furan oxidation proteinort639 3.7358 1 all1549 glr2748 0191 2111 405.29 133.1254 0217 slr1325 tll0584 122.7534 2324 232.3830 360.4975 spoT guanosine-3',5'-bis(diphosphate) 3'-diphosphatase, (ppGpp)aseort567 5.3363 1 alr0522 glr0073 0193 2109 456.45 134.1412 0219 slr1629 tll1538 115.2450 2326 232.1850 362.6597 rluD putative pseudouridylate synthase specific to ribosomal large subunitort136 5.7803 0 all0745 glr3866 0194 2108 482.87 134.1410 0220 slr0267 tlr1507 113.1916 2327 232.3505 361.5504 conserved hypothetical protein

ort494 2.7961 0 all4131 glr2313 0195 2106 501.49 134.1572 0221 slr0394 tll2268 113.2055 2329 232.4647 361.5524cbbK, pgk phosphoglycerate kinase

ort457 6.9532 0 alr0477 gll1671 0197 2101 494.62 131.814 0223 slr1656 tlr1809 107.798 2333 232.1819 361.5832 murG Undecaprenyl-PP-MurNAc-pentapeptide-UDPGlcNAc GlcNAc transferaseort407 8.1217 0 alr3936 gll3423 0198 2100 498.63 134.1570 0224 sll1713 tll2266 116.2830 2334 232.746 361.5256 hisC Aminotransferases class-Iort546 5.7483 0 all4272 gll3080 0199 2094 460.17 136.2715 0225 slr1418 tll0579 119.4239 2337 232.5080 356.4298 pyrD Dihydroorotate dehydrogenaseort577 2.5367 1 alr5303 glr1602 0201 2091 411.3 127.338 0227 sll1746 tlr0298 121.5561 2340 232.1507 241.1484 rpl12 50S ribosomal protein L7/L12ort575 4.3721 1 alr5302 glr1601 0202 2090 411.2 127.337 0228 sll1745 tlr0297 121.5560 2341 232.1506 241.1483 rpl10 50S ribosomal protein L10ort576 0.9899 1 alr5300 glr1599 0204 2088 354.24 127.334 0230 sll1743 tlr0295 121.5558 2343 232.1504 188.861 rpl11 50S ribosomal protein L11ort477 2.8070 1 alr5299 glr0830 0205 2087 354.23 127.333 0231 sll1742 tlr0294 121.5557 2344 232.1503 188.860 nusG transcription antitermination protein, NusGort348 3.8010 1 all3538 gll2121 0208 2083 484.29 127.371 0235 slr0752 tlr0658 121.6150 2348 232.2417 243.1513 eno Enolaseort227 6.1653 0 alr4515 glr4178 0209 2082 509.255 132.1045 0236 sll1770 tlr2242 121.5999 2349 232.2284 318.2921 possible kinaseort018 5.7029 0 alr2073 glr4034 0050 2135 363.8 135.2038 0054 sll1883 tll1911 122.7762 2353 232.2033 225.1272 argJ glutamate N-acetyltransferase / ornithine N-acetyltransferaseort310 8.2410 0 all1754 glr2134 0049 2136 444.19 136.2706 0053 slr0553 tlr0898 116.2769 2354 232.4174 357.4421 coaE putative dephospho-CoA kinaseort371 3.6018 1 alr1396 gll2342 0048 2139 363.7 136.2626 0052 sll1435 tlr2394 122.6707 2356 232.2836 317.2913 gatB glutamyl-tRNA (Gln) amidotransferase subunit Bort209 3.7407 1 alr3402 glr0710 0046 2148 506.28 129.618 0050 sll1852 tlr0267 101.113 2358 232.2327 362.6188 Nucleoside diphosphate kinaseort010 3.8672 1 alr2418 gll2348 0044 2157 421.16 132.1049 0048 sll0362 tll2103 120.4504 2360 232.4120 131.388 alaS Alanyl-tRNA synthetaseort375 4.4045 0 all4607 glr0246 1668 2169 504.31 135.1874 1829 slr0293 tll1603 118.3757 2374 232.1698 314.2812 gcvP Glycine cleavage system P-proteinort374 5.6959 0 all4608 glr3531 1669 2170 504.30 135.1873 1830 slr0879 tlr1678 118.3755 2375 232.1697 358.4606 gcsH putative Glycine cleavage H-proteinort137 4.3567 0 alr1600 glr3015 1670 2171 462.66 129.564 1831 slr0580 tlr2020 103.292 2376 232.3083 233.1366 conserved hypothetical proteinort600 5.3708 0 all0579 gll2632 1673 2175 497.165 129.579 1834 sll1244 tlr2413 103.305 2378 232.4219 330.3277 rpl9 50S ribosomal protein L9ort377 3.1755 1 alr3561 gll0881 1675 2177 428.31 125.166 1836 sll0202 tlr0535 121.6104 2382 232.2439 349.3904 gidA glucose inhibited division protein Aort260 9.2830 0 all1717 glr0727 0659 2184 459.8 136.2378 1841 sll1209 tll0857 112.1647 2389 232.4204 361.5402   Putative DNA ligaseort675 3.9285 1 all1318 glr2919 1682 2191 453.50 136.2871 1844 slr0557 tlr1503 121.5381 2396 232.1906 339.3526 valS valyl-tRNA synthetaseort195 7.1840 0 all2473 glr0783 1030 2208 465.20 132.1092 1502 sll1937 tlr0192 110.1285 2401 232.4292 361.5768 Ferric uptake regulator familyort138 7.8596 0 all5037 glr2334 1001 1386 457.38 132.999 1187 sll0157 tlr2433 122.6983 2402 232.1257 328.3193 conserved hypothetical proteinort205 6.1241 0 alr0512 gll2594 1684 2211 474.55 133.1320 1847 sll1005 tlr0809 118.3754 2418 232.1842 354.4155 MazG family proteinort376 5.7757 0 all4609 glr3530 1687 2215 504.29 131.813 1851 sll0171 tll0745 114.2154 2425 232.1696 254.1664 gcvT putative Glycine cleavage T-protein (aminomethyl transferase)ort028 3.5136 1 all2436 gll1928 1688 2216 484.71 128.447 1852 slr1720 tll1144 115.2408 2426 232.4140 353.4089 aspS Aspartyl-tRNA synthetase:GAD domain:tRNA synthetases, class I.ort549 2.7609 1 alr5000 glr1524 1689 2219 489.91 135.2004 1853 sll1443 tll0768 104.403 2428 232.1228 88.174 pyrG Glutamine amidotransferase class-I:CTP synthaseort214 7.6399 0 alr5095 gll3037 1691 2222 475.87 128.461 1855 slr0527 tlr1732 119.4300 2431 232.1322 362.6090 possible ATPaseort311 5.6999 0 alr0520 glr2580 1694 2237 439.48 136.2535 1858 sll0378 tlr0144 101.169 2475 232.1848 361.5933 cobA putative uroporphyrin-III c-methyltransferaseort507 4.8985 0 alr3593 gll1501 1696 2245 479.95 135.1613 1862 sll0290 tlr0213 108.973 2495 232.2076 362.6491 ppk putative polyphosphate kinase

Online Supporting Information

Page 51: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Table 2Page 19

ort140 8.4568 0 all1610 gll3147 1698 2247 483.38 134.1443 1864 slr1652 tll2039 118.3658 2497 232.3091 342.3653 conserved hypothetical proteinort342 1.7110 1 alr1742 glr4264 1704 2255 421.19 129.523 1871 sll0170 tlr1733 112.1833 2508 232.4186 362.6498 dnaK2 Molecular chaperone DnaK2, heat shock protein hsp70-2ort024 6.9624 0 alr2057 gll3278 1705 2257 454.58 129.634 1872 slr1559 tll0590 109.1223 2509 232.2021 336.3438 aroE Shikimate / quinate 5-dehydrogenaseort141 8.7828 0 all4804 glr3297 0654 2258 493.57 136.2427 1873 sll1737 tlr1073 120.4830 2510 232.1045 361.5451 ycf60 conserved hypothetical proteinort618 4.8520 1 all4802 gsl0434 1706 2259 493.58 136.2526 1874 sll1767 tlr1279 120.4831 2511 232.1043 361.5452 rps6 30S ribosomal protein S6ort016 3.1965 1 alr4798 glr2933 1707 2261 493.101 136.2576 1875 slr0585 tlr0712 114.2209 2513 232.1039 141.464 argG Argininosuccinate synthaseort451 6.6821 0 all4316 gll3738 1709 2264 452.50 136.2444 1878 sll0657 tll0971 122.7016 2516 232.5124 360.5013 mraY Putative phospho-N-acetylmuramoyl-pentapeptide-transferaseort672 2.5809 1 alr3716 glr4236 1712 2270 457.64 132.1076 1881 slr1844 tll0184 121.5477 2522 232.2490 314.2814 uvrA excinuclease ABC subunit Aort559 13.9041 0 alr4961 gll0271 1713 2271 299.2 135.1745 1882 sll1520 tll1899 118.3637 2523 232.1198 295.2380 recN putative DNA repair proteinort651 3.0887 1 all2072 gll1463 1716 2274 472.49 136.2571 1885 sll1172 tll1227 121.6077 2526 232.2032 362.6189 thrC threonine synthase

Online Supporting Information

Page 52: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Figure legends Page 20

Fig. 6. Representative backbone tree topologies and other hypotheses used in the Shimodaira-

Hasegawa test. In order to determine which of the above topologies best represent the

evolutionary history of the conserved protein families, we performed a Shimodaira-Hasegawa

(SH) test (74) in which the fitness of a particular topology to the 682 sequence alignments was

analyzed. To aid the comparison, we also generated topologies that display alternative placement

of SCO and TEL with minor or no changes in topologies of the conserved clades (T6-T10), as

well as topologies that break the conserved clades at random (T11-T15). Note that only 2~3%

topological divergence is displayed. Species abbreviations as in Table 1.

Fig. 7. Compatibility of tree topologies to the orthologous datasets. The SH tests show the

number of datasets accepting (black bars) or rejecting (grey bars) each of the 15 topologies at the

5% significance level. The SH tests indicated that almost all (97.5% to 99.6%) of the datasets

support topologies T1-T5 at the 95% confidence level (P=0.95), whereas much fewer datasets

support topologies T6-T11, and T11-T15 were strongly rejected by most of the datasets. While

this analysis corroborated the main stream as discerned through genomic approaches, it was

inadequate in choosing a unique, preferable topology to other similar ones. This result suggests

that artificial paralogs or genes potentially obtained by LGT may thwart the establishment of

organismal relationships based on plural consensus of individual gene phylogenies.

Online Supporting Information

Page 53: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

NJD; NJJ; NJK;QP ProtML; ML_consensus;ML_supertree

16S_NJ/ML/MP;NJ_γ

NJ_consensus NJ_supertree

SSU rRNA Concatenated proteins Consensus and supertree

ANAAVANPUTERCWASYNPMMPMSPMTSYWSCOTELGVI

ANAAVANPUTERCWASYNTELPMMPMSPMTSYWSCOGVI

ANAAVANPUTERCWASYNPMMPMSPMTSYWSCOTELGVI

ANAAVANPUTERCWASYNTELSCOSYWPMTPMSPMMGVI

ANAAVANPUTERCWASYNTELSYWPMTPMSPMMSCOGVI

T1 T2 T3 T4 T5

SYWPMTPMSPMMANAAVANPUSYNCWASCOTERTELGVI

T6 T7 T8 T9 T10

ANAAVANPUTERCWASYNSCOTELPMMPMSPMTSYWGVI

ANAAVANPUTERCWASYNSCOSYWPMTPMSPMMTELGVI

ANAAVANPUTERCWASYNTELSCOPMMPMSPMTSYWGVI

SYWPMSPMTPMMSCOSYNCWAANAAVANPUTELTERGVI

SYWPMTPMMPMSSCOTELSYNNPUAVAANACWATERGVI

SYNNPUANAAVATERCWASCOTELPMTPMMPMSSYWGVI

ANAAVANPUTELCWASYNSYWPMMPMTSCOPMSTERGVI

ANAAVANPUCWATERPMTPMSPMMSYWSCOTELSYNGVI

CWASCOSYN

Figure 6 Page 21

Shi & Falkowski, SI Fig. 6

Online Supporting Information

Page 54: Biological Sciences/Evolution  · Web view188 words in the abstract and total 36,149 characters in paper Abstract ... possible MesJ homolog ort177 4.3443 0 all3013 glr4006 1650 0069

Tree topologies1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Num

ber o

f ort

holo

gs

0

50

100

150

500

550

600

650

700

Figure 7 Page 22

Shi & Falkowski, SI Fig. 7

Online Supporting Information