11
Evolutionary Bioinformatics 2008:4 109–119 109 ORIGINAL RESEARCH Correspondence: Rob DeSalle, Email: FCA: [email protected], RDS: [email protected], ML: magdalena. [email protected], PBS: [email protected] Copyright in this article, its metadata, and any supplementary data is held by its author or authors. It is published under the Creative Commons Attribution By licence. For further information go to: http://creativecommons.org/licenses/by/3.0/. Examining Ancient Inter-domain Horizontal Gene Transfer Francisca C. Almeida 1,2 , Magdalena Leszczyniecka 3 , Paul B. Fisher 3 and Rob DeSalle 1,2 1 Department of Biology, New York University, New York, NY. 2 Sackler Institute for Comparative Genomics, American Museum of Natural History, 79th Street @ Central Park West, New York 10024, U.S.A. 3 Departments of Pathology, Urology and Neurosurgery, Herbert Irving Comprehensive Caner Center, Columbia University Medical Center, College of Physicians and Surgeons, New York, U.S.A. Abstract: Details of the genomic changes that occurred in the ancestors of Eukarya, Archaea and Bacteria are elusive. Ancient interdomain horizontal gene transfer (IDHGT) amongst the ancestors of these three domains has been difcult to detect and analyze because of the extreme degree of divergence of genes in these three domains and because most evidence for such events are poorly supported. In addition, many researchers have suggested that the prevalence of IDHGT events early in the evolution of life would most likely obscure the patterns of divergence of major groups of organisms let alone allow the tracking of horizontal transfer at this level. In order to approach this problem, we mined the E. coli genome for genes with distinct paralogs. Using the 1,268 E. coli K-12 genes with 40% or higher similarity level to a paralog elsewhere in the E. coli genome we detected 95 genes found exclusively in Bacteria and Archaea and 86 genes found in Bacteria and Eukarya. These genes form the basis for our analysis of IDHGT. We also applied a newly developed statistical test (the node height test), to examine the robustness of these inferences and to corroborate the phylogenetically identied cases of ancient IDHGT. Our results suggest that ancient inter domain HGT is restricted to special cases, mostly involving symbiosis in eukaryotes and specific adaptations in prokaryotes. Only three genes in the Bacteria + Eukarya class (Deoxy- xylulose-5-phosphate synthase (DXPS), fructose 1,6-phosphate aldolase class II protein and glucosamine-6-phosphate deaminase) and three genes–in the Bacteria + Archaea class (ABC-type FE3+ -siderophore transport system, ferrous iron transport protein B, and dipeptide transport protein) showed evidence of ancient IDHGT. However, we conclude that robust estimates of IDHGT will be very difcult to obtain due to the methodological limitations and the extreme sequence satura- tion of the genes suspected of being involved in IDHGT. Keywords: horizontal gene transfer, node height test, eschericia coli, blast Introduction While horizontal gene transfer (HGT) has been widely accepted as an important evolutionary force among prokaryotes (Lawrence and Ochman, 1997; Jain et al. 1999; Ochman et al. 2000), the role of HGT in the early evolution of life has been controversial (Woese, 2002). HGT has been suggested to occur between organisms belonging to the different domains of life: Bacteria, Archaea, and Eukarya (Hilario and Gogarten, 1993; Kandler, 1994, 1998; Gogarten, 1995; Katz, 1996; Aravind et al. 1998; Nelson et al. 1999; Woese, 2002; Klotz and Loewen, 2003). This kind of transfer is quite patent in the numerous cases of mitochondrial and chloroplast genes found in the nuclear genomes of some eukaryotes (Martin et al. 1998; Berg and Kurland, 2000; Rujan and Martin, 2001). Besides the organelle case, however, the impor- tance of ancient inter domain HGT (IDHGT) is still under debate (Teichmann and Mitchison, 1999; Kyripides and Olsen, 1999; Logsdon and Faguy, 1999; Stanhope et al. 2001; Snel et al. 2002; Eisen and Fraser, 2003). It has been suggested that IDHGT was so prevalent in the beginning of life that it would prevent a good assessment of the early branching of the tree of life (Doolittle, 1999a, 1999b). Most evidence for ancient ID HGT, however, is weak and/or based on non-phylogenetic methods that do not support IDHGT versus alternative hypotheses (Kurland, 2000; Koski and Golding, 2001; Koski et al. 2001; Ragan, 2001; Kurland et al. 2003; Brown, 2003). For instance, CG content or distinctive genomic characteristics have been used to suggest HGT in prokaryotes, but these genomic differences could also be due to distinctive evolutionary trends in some lineages related to natural selection (Hayes and Boro- dovsky, 1998). Another class of tests of HGT, the phyletic distributional proles based on BLAST searches could also be interpreted as gene loss and are largely affected by the database (Nelson et al. 1999; Sal- zberg et al. 2001; Roelofs and Van Haastert, 2001; Genereux and Logsdon, 2003).

Examining Ancient Inter-domain Horizontal Gene Transferdesalle.amnh.org/pdf/Almeida.2008.EvolBioinf.pdf · 111 Examining ancient inter-domain horizontal gene transfer Evolutionary

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

  • Evolutionary Bioinformatics 2008:4 109–119 109

    ORIGINAL RESEARCH

    Correspondence: Rob DeSalle, Email: FCA: [email protected], RDS: [email protected], ML: [email protected], PBS: [email protected]

    Copyright in this article, its metadata, and any supplementary data is held by its author or authors. It is published under the Creative Commons Attribution By licence. For further information go to: http://creativecommons.org/licenses/by/3.0/.

    Examining Ancient Inter-domain Horizontal Gene TransferFrancisca C. Almeida1,2, Magdalena Leszczyniecka3, Paul B. Fisher3 and Rob DeSalle1,21Department of Biology, New York University, New York, NY. 2Sackler Institute for Comparative Genomics, American Museum of Natural History, 79th Street @ Central Park West, New York 10024, U.S.A. 3Departments of Pathology, Urology and Neurosurgery, Herbert Irving Comprehensive Caner Center, Columbia University Medical Center, College of Physicians and Surgeons, New York, U.S.A.

    Abstract: Details of the genomic changes that occurred in the ancestors of Eukarya, Archaea and Bacteria are elusive. Ancient interdomain horizontal gene transfer (IDHGT) amongst the ancestors of these three domains has been diffi cult to detect and analyze because of the extreme degree of divergence of genes in these three domains and because most evidence for such events are poorly supported. In addition, many researchers have suggested that the prevalence of IDHGT events early in the evolution of life would most likely obscure the patterns of divergence of major groups of organisms let alone allow the tracking of horizontal transfer at this level. In order to approach this problem, we mined the E. coli genome for genes with distinct paralogs. Using the 1,268 E. coli K-12 genes with 40% or higher similarity level to a paralog elsewhere in the E. coli genome we detected 95 genes found exclusively in Bacteria and Archaea and 86 genes found in Bacteria and Eukarya. These genes form the basis for our analysis of IDHGT. We also applied a newly developed statistical test (the node height test), to examine the robustness of these inferences and to corroborate the phylogenetically identifi ed cases of ancient IDHGT. Our results suggest that ancient inter domain HGT is restricted to special cases, mostly involving symbiosis in eukaryotes and specific adaptations in prokaryotes. Only three genes in the Bacteria + Eukarya class (Deoxy-xylulose-5-phosphate synthase (DXPS), fructose 1,6-phosphate aldolase class II protein and glucosamine-6-phosphate deaminase) and three genes–in the Bacteria + Archaea class (ABC-type FE3+ -siderophore transport system, ferrous iron transport protein B, and dipeptide transport protein) showed evidence of ancient IDHGT. However, we conclude that robust estimates of IDHGT will be very diffi cult to obtain due to the methodological limitations and the extreme sequence satura-tion of the genes suspected of being involved in IDHGT.

    Keywords: horizontal gene transfer, node height test, eschericia coli, blast

    IntroductionWhile horizontal gene transfer (HGT) has been widely accepted as an important evolutionary force among prokaryotes (Lawrence and Ochman, 1997; Jain et al. 1999; Ochman et al. 2000), the role of HGT in the early evolution of life has been controversial (Woese, 2002). HGT has been suggested to occur between organisms belonging to the different domains of life: Bacteria, Archaea, and Eukarya (Hilario and Gogarten, 1993; Kandler, 1994, 1998; Gogarten, 1995; Katz, 1996; Aravind et al. 1998; Nelson et al. 1999; Woese, 2002; Klotz and Loewen, 2003). This kind of transfer is quite patent in the numerous cases of mitochondrial and chloroplast genes found in the nuclear genomes of some eukaryotes (Martin et al. 1998; Berg and Kurland, 2000; Rujan and Martin, 2001). Besides the organelle case, however, the impor-tance of ancient inter domain HGT (IDHGT) is still under debate (Teichmann and Mitchison, 1999; Kyripides and Olsen, 1999; Logsdon and Faguy, 1999; Stanhope et al. 2001; Snel et al. 2002; Eisen and Fraser, 2003). It has been suggested that IDHGT was so prevalent in the beginning of life that it would prevent a good assessment of the early branching of the tree of life (Doolittle, 1999a, 1999b). Most evidence for ancient ID HGT, however, is weak and/or based on non-phylogenetic methods that do not support IDHGT versus alternative hypotheses (Kurland, 2000; Koski and Golding, 2001; Koski et al. 2001; Ragan, 2001; Kurland et al. 2003; Brown, 2003). For instance, CG content or distinctive genomic characteristics have been used to suggest HGT in prokaryotes, but these genomic differences could also be due to distinctive evolutionary trends in some lineages related to natural selection (Hayes and Boro-dovsky, 1998). Another class of tests of HGT, the phyletic distributional profi les based on BLAST searches could also be interpreted as gene loss and are largely affected by the database (Nelson et al. 1999; Sal-zberg et al. 2001; Roelofs and Van Haastert, 2001; Genereux and Logsdon, 2003).

    http://creativecommons.org/licenses/by/3.0/.http://creativecommons.org/licenses/by/3.0/.

  • 110

    Almeida

    Evolutionary Bioinformatics 2008:4

    Here we propose an approach for the detection of HGT and use it to examine ancient classes of IDHGT using phylogenetic analysis—the only available method capable to distinguish HGT from other hypothesis (Logsdon and Faguy, 1999; Ragan, 2001). We also introduce and test a new method based on node height differences in phy-logenetic comparisons that is faster than phyloge-netic tree searching. Our approach was to make a general assessment of ancient IDHGT using these methods and the available GenBank database. In this way we assess the possibility of detecting reli-able evidences of HGT (that can discriminate among alternative explanations for an observed pattern) and possible problems that are often times not taken into account in HGT analyses. In addi-tion, our approach is a fi rst step in examining the ability to detect ancestral IDHGT using robust cladistic methods. By using phylogenetic methods and taking advantage of the most commonly accepted topology of the tree of life, we examine IDHGT between Bacteria, Archaea and Eukarya using the E. coli genome as a reference genome.

    Since the importance of HGT following endosimbiosis events is well recognized, we focus on HGT that does not involve endosimbiotic asso-ciations. Our approach is very conservative and we intentionally do not offer this method as a method to understanding endosymbiotic aspects of inter-domain gene transfer. Several excellent studies have examined the wholesale transfer of genes via endosymbiotic relationships (Karlberg et al. 2000; Palenik, 2002; Martin et al. 2002). Our concern in this paper is to examine those extremely diffi cult episodes of HGT that did not occur as a result of endosymbiotic relationships. We also do not attempt here to make a thorough search for horizontally transferred genes, since our approach also has several limitations, but the same method-ology we propose here could be modifi ed and used in alternative, more thorough analyses. Neverthe-less, the approach presented here allows for an estimation of the frequency of HGT events that can be detected within the limitations imposed by the data and the methods available. Our results, using a gram negative bacteria centric analyses indicate that only a few instances of statistically supported evidence of HGT exist. We suggest that this obser-vation is due to substitution saturation and lack of resolution of the phylogenetic trees and that these problems may preclude any good estimation of ancient interdomain transfers.

    Materials and Methods

    Screening for genes with potential for IDHGTIn order to apply phylogenetic methods for detecting HGT, one should be able to produce rooted trees. When the ingroup of the phylogenetic analysis includes all forms of life, the outgroup is usually a paralogous gene and hence our fi rst screening consisted of fi nding genes with a suitable paralog to be used as outgroup. A list of E.coli K-12 paralog genes with 40% or higher similarity level was downloaded from http://www.tigr.org/tigr-scripts/CMR2/LevelsOfParalogy1.spl?db_data_id = 99. This list consisted of 1,268 genes from a total of approximately 4,200 genes found in the E. coli genome. E. coli K-12 was chosen as a guide because it has a well-annotated and fairly large genome among bacteria (Wernegreen et al. 2000).

    Our second step in the screening was to look for genes that have a taxonomic distribution in the three domains that deviates from the expected. This screen was based on the most accepted hypothesis for the tree of life that suggests a closer relationship of Archaea and Eukarya to the exclusion of Bacteria (Searcy et al. 1978; Zillig et al. 1989, 1992; Iwabe et al. 1989; Gogarten et al. 1989; Brown et al. 2001). Using a phyletic distributional profi le with the specifi c inter-domain distributions boxed in Figure 1 as a guide, we focused on orthologs that

    Bacteria Archaea Eukarya+ +

    + +

    Figure 1. Distributional profi le method. A fi rst screening for genes involved in ancient IHGT was done using the distributional profi le method. Based on the most accepted hypothesis of phylogenetic relationships between the three domains of life, genes that occur in Eukarya and Bacteria, but not in Archaea, or genes that occur in Archaea and Bacteria, but not in Eukarya were potential candidates to have been horizontally transferred between domains. Genes found in one of those two phyletic distributional categories were further tested for HGT (see text).

  • 111

    Examining ancient inter-domain horizontal gene transfer

    Evolutionary Bioinformatics 2008:4

    exist in Bacteria AND either Archaea OR Eukarya. Therefore among the 1,268 genes we examined, we looked for those that were present in Archaea but absent in Eukarya, and those that were present in Eukarya but absent in Archaea. Based on the (Bacteria(Archaea, Eukarya)) hypothesis, there are two alternative explanations for these distributional profi les: the gene was either present in the univer-sal common ancestor and posteriorly lost in the domain that lacks it, or the gene was horizontally transferred after the split of the domains.

    To obtain the distributional profi le of the genes in the fi rst list, we used BLAST (blastp) searches against the all the available data in the GenBank at the time of the searches. To make the taxonomic screen of the paralogs easier, we conducted these searches using the “Blink” option, which shows the results of the search color-coded by taxonomic group: Archaea, Bacteria, and Eukaryotes subdivided into Metazoa, Plants, Fungi, and other Eukaryotes. The “Blink” is a link available for each sequence on the NCBI website. Because the “Blink” gives only the 200 best hits, when this number was reached with hits for the same gene, the distribution was double-checked using Blastp to confi rm absence of the gene in Archaea and Eukarya. In this case, we used e–10as a cut off value for the presence of a gene in a given domain. To be useful as an outgroup, a paralog should have appeared in a duplication event that occurred before the split of the three domains, instead of being exclusive to Bacteria. Hence, we used the “Blink” as described above to check the distribution of the paralogs. Only genes with a paralog that seem to have appeared before the split of the three domains were retained for further analyses. We also discarded from the analyses all the genes for which orthology and paralogy could not be promptly and confi dently determined, and the genes that are known to be involved in mito-chondrial or chloroplast metabolism. Amino acid sequences of the genes included in the analyses and their paralogs were downloaded and aligned with ClustalX using standard parameters.

    Phylogenetic analysisMaximum parsimony trees were obtained with PAUP 4.01 (Swofford, 2003) using heuristic search with 10 random stepwise additions. Analyses were done using at least fi ve paralog sequences. Full bootstrap tests with 500 replicates were performed

    to test consistency of the branches of strict consensus trees. Evidence of HGT from bacteria to eukaryotes (or archaeans) would be illustrated by a paraphyly of bacteria, with some of them more closely related to eukaryotes or archaeans than to other bacteria. Our focus on parsimony approaches is reasonable and conservative. In fact, we avoid making an inference about a HGT when we detect saturation of sequence changes in our tests.

    Node height testThe node height test, like the phylogenetic analysis, has the objective to test the different hypotheses of lineage extinction (gene loss in a determined lin-eage) and HGT. Other approaches using rates of evolution to test for HGT have been discussed in Novichkov et al. (2004). The test we present here compares substitution rates within and among groups (here among domains). In the case of lineage extinction, it is expected that the substitution rates within groups will be higher than the rates among groups (see Fig. 3 for a graphical explanation). In the case of HGT, however, there should be no dif-ferences in the average substitution rates within and among groups. The test is dependable on homoge-neous substitution rates across taxa. Homogeneity of substitution rates across domains was tested with the software RRTree (Robinson et al. 1998). The rate test was done on the whole sequence alignment for all genes. For the genes saturated with substitu-tions, the regions with gaps and poor alignment were trimmed and the test was redone on the remaining residues. The node height test was per-formed on the genes for which it was possible to rule out substitution rate differences. A pair wise distance matrix was obtained with PAUP* 4.01b (Swofford, 2002) for each of those genes. Average distances within Bacteria (B1-B1) and between bacteria and the other domain in which the gene is present (B1-E1 or B1-A1) were compared with ANOVA. A node height effect was detected when B1-E1 or B1-A1 was not signifi cantly larger than B1-B1(Supplemental Material File 1).

    Results and Discussion

    Phyletic distributions of genes across domainsOur fi rst screen was based on the most accepted hypothesis for the tree of life that suggests a closer

  • 112

    Almeida

    Evolutionary Bioinformatics 2008:4

    relationship of Archaea and Eukarya to the exclusion of Bacteria (Iwabe et al. 1989; Brown et al. 2001) as described above. Among the 1,268 genes we examined, 402 were present in all three domains, 545 were present only in Bacteria, 95 were also present in Archaea but absent in Eukarya, and 86 were present in Eukarya but absent in Archaea. For 140 genes it was not pos-sible to determine if the hits obtained in the BLAST searches were orthologs or paralogs, mostly due to nomenclatural problems. These categories of genes indicate some evolutionary discontinuity if the sister pair Archaea and Eukarya do not have the same distribution pattern. We fi rst examined if the patterns obtained using phylogenetic analysis are consistent with HGT or lineage extinction.

    Bacteria—Eukarya exclusive patternsAmong the genes present only in Bacteria and Eukarya, 21 were involved in mitochondrial or plastid metabolism, including those physically localized either in the nuclear or organellar genome. The horizontal transfer of those genes is not under question and we excluded them from further analyses. Twenty-six genes were also discarded for having spotty distributions in the three domains, and 19 for problems with orthology and paralogy determination or absence of a useful paralog (see Supplemental Materials File 2 for list of genes and taxa examined). Twenty genes were tested and for three of them the phylogenetic analysis suggests ancient IDHGT from bacteria to eukaryotes (Table 1). Lack of statistical support prevented us from rejecting IDHGT or gene loss for the remaining genes. All the 3 genes recovered here as probable IDHGT cases had already been described as such. Deoxyxylulose-5-phosphate synthase (DXPS) is an enzyme that participates in several pathways involving coenzyme and carbohydrate metabolism. It is the fi rst enzyme of an alternative pathway for the production of isoprenoids and is present only in bacteria and plants (Lange et al. 2000). In our analysis DXPS clusters with proteobacteria with a high bootstrap support (Supplemental Material File 3). This group of Bacteria includes plant-symbiotic species and the ancestor of mitochondria, which may place this gene in the general category of organelle to nucleus HGT.

    The second protein, fructose 1,6-phosphate aldolase class II is involved in sugar metabolisms and participates in the glycolysis I and gluconeogenesis pathways. The study of this enzyme is complicated by the presence of several types that represent distinct paralogs (Sanchez et al. 2002). The fructose 1,6-phosphate aldolase class II protein is found in fungi and protists, but those two groups do not form a clade in our analysis (Supplemental Materials File 3).

    The third protein, glucosamine-6-phosphate deaminase, is also involved in sugar metabolism and participates in the pentose phosphate and Entner-Doudoroff pathways. Andersson et al. (2003) had already reported this gene as a case of HGT, but the lack of rooting precluded an inter-pretation of the direction of the transfer by these authors. The tree presented here clearly supports an HGT from eukayotes in the lineage of animals and fungi to proteobacteria (Fig. 2). Our analysis also indicated a transfer involving the protist Entamoeba histolytica and a group of bacteria, as suggested by Andersson et al. (2003). However, in disagreement with their results, all protists clustered in the same clade and it seems more likely that transfer occurred from the protist to the bacteria.

    Bacteria—Archaea exclusive patternsThe analysis of the genes present only in Archaea and Bacteria showed that three genes were probably transferred from the former domain to thermophilic bacteria: ABC-type FE3+ -siderophore transport system, ferrous iron transport protein B, and dipeptide transport protein (Table 2, Supplemental Materials File 3). All 3 genes are involved in cellular transport and two of them specifi cally in iron transport. HGT from Archaea to thermophilic bacteria has been reported to be as high as 24%, but this percentage was obtained based only on the overall similarity of the genomes and most of the phylogenetic tests we performed failed to support HGT (Nelson et al. 1999; Logsdon and Faguy, 1999). Although we included many different species of thermophilic bacteria, only the ones belonging to the genus Thermotoga were found to be involved in the cases of possible HGT. The Thermotogales are believed to be the most basal group of bacteria and the grouping of their genes with those of archaeans may be due to retention of ancestral sequences

  • 113

    Examining ancient inter-domain horizontal gene transfer

    Evolutionary Bioinformatics 2008:4

    under strong selection on particular genes in a hot, harsh environment (Kyripides and Olsen, 1999; Logsdon and Faguy, 1999). Even convergence cannot be ruled out.

    A more careful analysis of the candidate genes for HGT between those taxa is still needed to pro-vide better support for HGT. Phylogenetic analysis suggested HGT for 3 other archaeal genes, formate dehydrogenase (cytochrome B556 subunit), glu-cose-1-phosphate thymidylyltransferase, and ade-nine deaminase, but the direction is not clear (Supplemental Material File 3). For all other genes the phylogenetic analyses gave very poor resolution and was unable to recover both Bacteria and Archaea as monophyletic groups. However, for one gene, protein secretion membrane protein, HGT was sta-tistically rejected and gene loss is clearly the most likely hypothesis (the phylogenetic tree obtained for this gene resembles the one shown in Fig. 3a).

    The node height approach to HGT detectionHere we introduce and show the results of an alternative test that we propose discerns between HGT and Lineage Extinction called the node height test. Since differences in substitution rates may give a similar effect, the test can be applied only to genes that have homogeneous substitution rates across the different domains. Tables 1 and 2 show the results of the rate test for the bacterial-eukaryotic exclusive genes and bacterial-archaeal exclusive genes, respectively. Most of the genes were saturated with substitutions and could not be tested using the node height test. Nevertheless, a positive effect was detected for both DPXS and fructose 1,6-bisphosphate aldolase class II (protists only), in agreement with the results of phylogenetic analysis. The test was not done on the glucosamine-6-phosphate deaminase gene because it was saturated.

    Table 1. Summary of results for genes with a Bacteria—Archaea exclusive pattern. Results of the relative rate test, rate test using only conserved gene regions, node height test, and support of HGT given by phylogenetic analyses.

    Gene name Ratetest

    Rate test cons.

    Node height test

    Trees

    2-octaprenyl-6-methoxyphenol k.s. k.s. - noATP-dependent specifi city component of clpP serine protease, chaperone

    k.s. k.s. - no

    glucosamine-6-phosphate deaminase k.s. k.s. - yesGTP-binding elongation factor, may be inner membrane protein k.s. k.s. - nopoly(A) polymerase I k.s. k.s. - noputative GTP-binding factor k.s. k.s. - noguanylate kinase k.s. * - no3-demethylubiquinone-9 3-methyltransferase k.s. n.s. ** nobiosynthetic arginine decarboxylase k.s. n.s. ** noesterase D k.s. n.s. ** noglyoxylate-induced protein k.s. n.s. ** noheat shock protein hslVU, ATPase subunit, homologous to chaperones

    k.s. n.s. ** no

    probable protein-tyrosine-phosphatase k.s. n.s. ** nopyridoxal kinase 2 / pyridoxine kinase k.s. n.s. ** noenoyl-[acyl-carrier-protein] reductase (NADH) k.s. n.s. ** noglutathionine S-transferase n.s. ** nopolynucleotide phosphorylase n.s. ** noprobable 6-phospho-beta-glucosidase n.s. ** no1-deoxyxylulose-5-phosphate synthase n.s. n.s. yesfructose-bisphosphate aldolase, class II n.s. n.s. yes

    Notes: *Signifi cant at α 0.05, **Signifi cant at α 0.01.Abbreviations: k.s.: substitution saturation; n.s.: non-signifi cant at α 0.05.

  • 114

    Almeida

    Evolutionary Bioinformatics 2008:4

    Figure 2. Maximum parsimony analysis of the gene Glucosamine-6-phosphate (nagB, COG0363). The tree depicts the strict consensus of 6 most parsimonious reconstructions. Bootstrap values above 75 are shown on nodes. Rooting was done with the paralog gene Phosphogluconate. E stands for Eukarya, B for Bacteria, and A for Archaea. The number 1 refers to sequences of the gene under analysis, and the number 2 to sequences of the paralog used for rooting.

    B1 Act comitansB1 Hae somnus1B1 Bar henselaeB1 Sin melilotiB1 Ral metalliduransB1 Rho sphaeroidesB1 Cor efficiensB1 Bac anthracisB1 Ent faeciumB1 Leu mesenteroidesB1 Lac lactisB1 Str agalactiaeB1 Str mutansB1 Str pneumoniaeB1 Str pyogenesB1 Lac gasseriB1 Oen soeniB1 Sta aureusB1 Bac haloduransB1 Oce iheyensisB1 Bac subtilis2B1 Myc pulmonisB1 Clo acetobutylicumB1 Clo perfringensB1 The tengcongensisB1 Bac subtilis1B1 Cyt hutchinsoniiB1 Bac fragilisB1 Por gingivalisE1 Ent histolytica glucosamineE1 Gia intestinalisE1 Spi barkhanus glucosamineB1 Des hafnienseB1 Pir spB1 Por gingivalis2B1 Hae somnus2B1 Hae influenzaeB1 Pas multocidaB1 Sal entericaB1 Sal typhimuriumB1 Shi flexneriB1 Yer pestisB1 Vib choleraeE1 Cae elegansE1 Cae briggsae HypotheticalE1 Dro melanogasterE1 Hom sapiensE1 Mes auratusE1 Mes auratus oscillinE1 Mus musculusE1 Mus musculus OscillinE1 Xen laevis UnknownE1 Asp nidulans hypotheticalE1 Can albicansE1 Neu crassa hypotheticalB1 Bif longumB1 Str coelicolorB1 Mag magnetotacticumB1 Esc coliB1 Lis innocuaB1 Lis monocytogenesB1 Pro marinusB1 Nos punctiformeB1 Nos spB1 Tri erythraeumA2 Hal sp.B2 Act comitansB2 Hae somnusE2 Dic discoideumE2 Cae elegansE2 Hom sapiensE2 Sac cerevisiaeB2 Bac subtilisB2 Buc aphidicolaB2 Esc coliB2 Lis monocytogenesB2 Lac lactisB2 Oce iheyensisB2 Sta aureusB2 Nos punctiformeB2 The elongatusB2 Syn spE2 Ara thalianaE2 Phy infestansE2 Pla yoelii

    100

    100

    100

    B2 Chl trachomatis

    9080

    100

    100

    100

    100

    100

    91

    98

    98

    98

    98

    9797

    96

    98

    94

    93

    8882

    85

    79

    84

    85

  • 115

    Examining ancient inter-domain horizontal gene transfer

    Evolutionary Bioinformatics 2008:4

    Among the bacterial-archaeal specifi c genes, a larger number were saturated and could not be tested (Table 2). A node height effect was detected for adenine deaminase and dipeptide transport. However, the effect detected for each gene suggest opposite directions of HGT (Bacteria to Archaea, and Archaea to Bacteria, respectively), corroborating the results of the phylogenetic analyses.

    IDHGT estimationWe found 3 genes that were involved in HGT between Bacteria and Eukarya and 6 genes that were involved in HGT between Bacteria and

    Archaea. The results obtained here were expected in view that IDHGT is for obvious biological reasons more likely to occur between simple, uni-cellular organisms. No instance of HGT involving animals was detected. Although the results cor-roborate the importance of IDHGT for some taxa, it suggests that fi rst, IDHGT is not so prevalent as suggested before and second that it is mostly restricted to some groups.

    We acknowledge that the approach used here restricted the number of genes that could be tested. Some of these limitations were imposed by the methodology itself, but some are related to the data.

    Table 2. Summary of results for genes with a Bacteria—Eukarya exclusive pattern. Results of the relative rate test, rate test using only conserved gene regions, node height test, and support of HGT given by phylogenetic analyses.

    Gene name Ratetest

    Rate test cons.

    Node height test

    Phylogenetic analysis

    ABC-type FE3+-siderophore transport system, permease component

    k.s. ks. - yes

    ATPase of high-affi nity potassium transport system, B chain k.s. k.s. - noexcision nuclease subunit A k.s. k.s. - noferrous iron transport protein B k.s. k.s. - yesglucose-1-phosphate thymidylyltransferase k.s. k.s. - yesphosphoheptose isomerase k.s. k.s. - nopleiotrophic effects on 3 hydrogenase isozymes (HypD) k.s. k.s. - noprotein secretion, membrane protein k.s. k.s. - nosuppresses inhibitory activity of CsrA k.s. k.s. - nothiamin-monophosphate kinase k.s. k.s. - notranscriptional regulatory protein (HypAF) k.s. k.s. - notransport of potassium k.s. k.s. - noFKBP-type peptidyl-prolyl cis-trans isomerase k.s. k.s. - noplays structural role in maturation of all 3 hydrogenases (HypE)

    k.s. k.s. - no

    site-specifi c recombinase, acts on cer sequence of ColE1, effects chromosome segregation at cell division

    k.s. * - no

    4-aminobutyrate aminotransferase k.s. n.s. ** nodipeptide transport protein k.s. n.s. n.s yesformate dehydrogenase, cytochrome B556 (FDO) subunit k.s. n.s. ** yesglucose-1-phosphate uridylyltransferase k.s. n.s. ** nohigh-affi nity phosphate-specifi c transport system k.s. n.s. ** nopart of maltose permease, inner membrane k.s. n.s. ** nophosphotransacetylase k.s. n.s. ** noregulator for asnA, asnC and gidA k.s. n.s. ** nospermidine/putrescine transport system permease k.s. n.s. ** noUDP-N-acetyl-D-mannosaminuronic acid dehydrogenase k.s. n.s. ** noprobable adenine deaminase (synthesis xanthine) k.s. n.s. n.s. yes

    Notes: *signifi cant at α 0.05, **signifi cant at α 0.01.Abbreviations: k.s.: substitution saturation; n.s.: non-signifi cant at α 0.05.

  • 116

    Almeida

    Evolutionary Bioinformatics 2008:4

    In the fi rst case, for instance, we did not include genes that are present in the three domains, even though these genes could also be involved in IDHGT. It could happen that one gene was present in the universal ancestral, lost in one of the domains, or some species of one of the domains, and reacquired through IDHGT. That includes the 402 genes found in all three domains. Nevertheless, the approach, including the Node Height test, can be easily automated and could be used in highthroughput screens. We were also limited by the starting dataset: genes present in E. coli. Although this species has a fairly large genome among Bacteria, it lacks many genes that are pres-ent in other bacterial groups, for instance the gram positive Bacteria. In fact, some gram positive bac-terial genes and Archaeal genes appear to be more similar to each other (Brown et al. 1994; Benach-enhou et al. 1993) suggesting a focus on gram positive or Archaeal genomes might reveal even more cases of IDHGT. The approach used here could easily be adapted to search for these genes using the genome of a gram positive Bacteria as a starting point, replacing the E. coli genome. In addition, our dataset was limited by our cutoff for paralogs. However, given the fact that many of the paralagous gene families we examined at the 40% cutoff showed saturation, we suggest that an even lower cutoff would have resulted in even more extreme saturation. Limitations due to the data include sequence availability, substitution saturation in the sequences, lack of paralogs, and nomencla-tural problems. Many genes had to be discarded for the latter reason, indicating the urge to fi nd a better way to name and classify genes.

    Are there acceptable methods of HGT detection?The node height test corroborated the results of the phylogenetic analysis in almost all the cases we detected and may be used alternatively in cases where the taxa are well represented, since it is less time consuming. The weakness of this test is its dependence on homogeneous substitution rates, which makes it useful for testing only a limited number of genes. Yet, the problem with saturated substitution is likely to affect the results of phylogenetic analysis in a similar way, produc-ing poorly resolved trees and lack of statistical support for nodes. The correct identifi cation of orthologs and paralogs is crucial for both meth-ods. It is important to keep in mind that phyloge-netic analysis, despite being the most reliable method, can also give false evidence of HGT in cases of convergence, retention of ancestral char-acter states, and higher evolutionary rate in one particular lineage (long-branch attraction). All those problems are more likely to occur in highly divergent genes that are saturated with substitu-tions, as is the case of most of the genes studied here. Hence, limitations of the methods may pre-clude a good estimate of inter-domain HGT and decrease considerably, the robustness of infer-ences concerning IDHGT. Our results suggest that ancient inter domain HGT is restricted to special cases, mostly involving symbiosis in eukaryotes, specifi c adaptations in prokaryotes, and specifi c cases in single celled eukaryotes (Doolittle, 1998; Andersson et al. 2003; Huang et al. 2004; Huang et al. 2004; Andersson et al. 2005; Huang et al. 2005; Huang and Gogarten, 2006).

    B1 B1 E1 B2

    (a)

    B1 B1E1 B2

    (b)Figure 3. The node height test. (a) If an Euk-Bac distribution was caused by gene loss, than the E1-B1 (Eukarya to Bacteria) average distance is expected to be higher than the average B1-B1 (Bacteria to Bacteria) distance. (b) However, if eukaryotic genes (E1) were gained by transfer from a particular group of bacteria (B1), than the average E1-B1 distance should not be higher than the average B1-B1 distance. These predictions were done based on the assumption that substitution rates are homogeneous across taxa.

  • 117

    Examining ancient inter-domain horizontal gene transfer

    Evolutionary Bioinformatics 2008:4

    ConclusionWe present a new method for detection of HGT that corroborates most of the results obtained with phylogenetic analysis. It is important to note that we limited our study to just those genes in the E. coli genome that have orthologs, but we suggest this gene set is a good fi rst approximation of the utility of the approach. The node height test although restrictive was reliable, corroborating the results of the phylogenetic analyses. Since this method is quick and easily automated, it could be used in automates screens for HGT including a larger dataset than the one analyzed here. The grand majority of the genes that by our method are sug-gested to be involved in ID HGT have also been indicated in other studies that use phylogenetic methods. Although our approach is very limited, this result suggests that most HGT cases that can be identifi ed using phylogenetic methods have already been identifi ed and are in agreement with previous claims that the occurrence of HGT has been overestimated and alternative hypothesis largely put aside (Kyripides and Olsen, 1999; Kurland et al. 2003; Genereux et al. 2003). In conclusion, we suggest that in general reliable estimates of ancient IDHGT will be very diffi cult to obtain due to methodological and data limita-tions as discussed above.

    Authors' ContributionsFCA performed all the genome level searches, phylogenetic tree analyses and compiled the results of node height tests. RD, ML and PF helped develop the node height test approach and assisted in genome level searches. FA and RD contributed to the writing of the manuscript.

    AcknowledgementsThe authors thank the Lewis and Dorothy Cullman Program for Molecular Systematics at the AMNH for support during the writing of this paper.

    ReferencesAndersson, J.O., Sjogren, A.M., Davis, L.A., Embley, T.M. and Roger, A.J.

    2003. Phylogenetic analyses of diplomonad genes reveal frequent lateral gene transfers affecting eukaryotes. Curr. Biol., 13(2):94–104.

    Andersson, J.O., Sarchfi eld, S.W., Roger, A.J., Sjogren, A.M., Davis, L.A. and Embley, T.M. 2005. Gene Transfers from Nanoarchaeota to an Ancestor of Diplomonads and Parabasalids—Phylogenetic analyses of diplomonad genes reveal frequent lateral gene transfers affecting eukaryotes. Mol. Biol. Evol., 22(1):85–90.

    Aravind, L., Tatusov, R.L., Wolf, Y.I., Walker, D.R. and Koonin, E.V. 1998. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet., 14(11):442–4.

    Berg, O.G. and Kurland, C.G. 2000. Why mitochondrial genes are most often found in nuclei. Mol. Biol. Evol., 17(6):951–61.

    Benachenhou-Lahfa, N., Forterre, P. and Labedan, B. 1993. Evolution of glutamate dehydrogenase genes: Evidence for paralogous protein families and unusual branching patterns of the archaebacteria in the universal tree of life. J. Mol. Evol., 36:335–46.

    Brown, J.R. 2003. Ancient horizontal gene transfer. Nat. Rev. Genet., 4:121–32.Brown, J.R., Masuchi, Y., Robb, F.T. and Doolittle, W.F. 1994. Evolutionary

    relationships of bacterial and archaeal glutamine synthetase genes. J. Mol. Evol., 38:566–76.

    Brown, J.R., Douady, C.J., Italia, M.J., Marshall, W.E. and Stanhope, M.J. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet., 28(3):281–5.

    Doolittle, W.F. 1998. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet., 14(8):307–11.

    Doolittle, W.F. 1999a. Phylogenetic classifi cation and the universal tree. Science, 284(5423):2124–9.

    Doolittle, W.F. 1999b. Lateral genomics. Trends Cell Biol., 9(12):M5–8.Eisen, J.A. and Fraser, C.M. 2003. Phylogenomics: intersection of evolution

    and genomics. Science, 300(5626):1706–7.Genereux, D.P. and Logsdon, J.M. Jr. 2003. Much ado about bacteria-

    to-vertebrate lateral gene transfer. Trends Genet., 19(4):191–5.Gogarten, J.P., Kibak, H., Dittrich, P., Taiz, L., Bowman, E.J., Bowman,

    B.J., Manolson, M.F., Poole, R.J., Date, T., Oshima, T. et al. 1989. Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc. Natl. Acad. Sci., U.S.A. 86(17):6661–5.

    Gogarten, J.P. 1995. The early evolution of cellular life. Trends in Ecology and Evolution, 10:147–51.

    Hilario, E. and Gogarten, J.P. 1993. Horizontal transfer of ATPase genes—the tree of life becomes a net of life. Biosystems, 31(2-3):111–9.

    Huang, J. and Gogarten, J.P. 2006. Ancient horizontal gene transfer can benefi t phylogenetic reconstruction. Trends Genet., 22(7):361–6.

    Huang, J., Mullapudi, N., Lancto, C., Scott, M., Abrahamsen, M. and Kissinger, J. 2004. Phylogenomic evidence supports past endosym-biosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome. Biology, 5(11):R.88.

    Huang, J., Mullapudi, N., Sicheritz-Ponten, T. and Kissinger, J.C. 2004. A fi rst glimpse into the pattern and scale of gene transfer in Apicom-plexa. Int. J. Parasitol., 34(3):265–74.

    Huang, J., Xu, Y. and Gogarten, J.P. 2005. The Presence of a Haloarchaeal Type Tyrosyl-tRNA Synthetase Marks the Opisthokonts as Mono-phyletic. Mol. Biol. Evol., 22(11):2142–6.

    Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S. and Miyata, T. 1989. Evo-lutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci., U.S.A. 86(23):9355–9.

    Jain, R., Rivera, M.C. and Lake, J.A. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci., U.S.A. 96(7):3801–6.

    Kandler, O. 1994. The early diversifi cation of life. Early Life on Earth, Nobel Symposium. S. Bengston, Columbia Univ. Press84:152–509.

    Kandler, O. 1998. The early diversifi cation of life: A proposal. Thermophiles: the key to molecular evolution and the origin of life? M Adams and J. Weigel, Taylor and Francis19–31.

    Karlberg, O., Canbäck, B., Kurland, C.G. and Andersson, S.G.E. 2000. The dual origin of the yeast mitochondrial proteome. Yeast Comp. Functional Genomics, 17:170–87.

    Katz, L.A. 1996. Transkingdom transfer of the phosphoglucose isomerase gene. J. Mol. Evol., 43(5):453–9.

    Klotz, M.G. and Loewen, P.C. 2003. The molecular evolution of catalatic hydroperoxidases: evidence for multiple lateral transfer of genes between prokaryota and from bacteria into eukaryota. Mol. Biol. Evol., 20(7):1098–112.

  • 118

    Almeida

    Evolutionary Bioinformatics 2008:4

    Koski, L.B. and Golding, G.B. 2001. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol., 52(6):540–2.

    Koski, L.B., Morton, R.A. and Golding, G.B. 2001. Codon bias and base composition are poor indicators of horizontally transferred genes. Mol. Biol. Evol., 18(3):404–12.

    Kurland, C.G. 2000. Something for everyone: horizontal gene transfer in evolution. EMBO Reports, 1(2):92–5.

    Kurland, C.G., Canback, B. and Berg, O.G. 2003. Horizontal gene transfer: a critical view. Proc. Natl. Acad. Sci., U.S.A. 100(17):9658–62.

    Kyrpides, N.C. and Olsen, G.J. 1999. Archaeal and bacterial hyperthermo-philes: horizontal gene exchange or common ancestry? Trends Genet., 15(8):298–9.

    Lange, B.M., Rujan, T., Martin, W. and Croteau, R. 2000. Isoprenoid bio-synthesis: the evolution of two ancient and distinct pathways across genomes. Proc. Natl. Acad. Sci., U.S.A. 97(24):13172–7.

    Lawrence, J.G. and Ochman, H. 1997. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol., 44(4):383–97.

    Logsdon, J.M. and Faguy, D.M. 1999. Thermotoga heats up lateral gene transfer. Curr. Biol., 9(19):R747–51.

    Martin, W., Stoebe, B., Goremykin, V., Hapsmann, S., Hasegawa, M. and Kowallik, K.V. 1998. Gene transfer to the nucleus and the evolution of chloroplasts. Nature, 393(6681):162–5.

    Martin, W. et al. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl. Acad. Sci., U.S.A. 99:12246–51.

    Nelson, K.E., Clayton, R.A., Gill, S.R., Gwinn, M.L., Dodson, R.J., Haft, D.H., Hickey, E.K., Peterson, J.D., Nelson, W.C., Ketchum, K.A. et al. 1999. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature, 399(6734):323–9.

    Novichkov, P.S., Omelchenko, M.V., Gelfand, M.S., Mironov, A.A., Wolf, Y.I. and Koonin, E.V. 2004. Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution. J. Bacteriol., 186(19):6575–85.

    Ochman, H., Lawrence, J.G. and Groisman, E.A. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature, 405(6784):299–304.

    Palenik, B. 2002. The genomics of symbiosis: Host keep the baby and the bath water. Proc. Natl. Acad. Sci., U.S.A. 99:11996–7.

    Ragan, M.A. 2001. On surrogate methods for detecting lateral gene transfer. FEMS Microbiol. Lett, 201(2):187–91.

    Robinson, M., Gouy, M., Gautier, C. and Mouchiroud, D. 1998. Sensitivity of the relative-rate test to taxonomic sampling. Mol. Biol. Evol., 15(9):1091–8.

    Rujan, T. and Martin, W. 2001. How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet., 17(3):113–20.

    Salzberg, S.L., White, O., Peterson, J. and Eisen, J.A. 2001. Microbial genes in the human genome: lateral transfer or gene loss? Science, 292(5523):1903–6.

    Sánchez, L., Horner, D., Moore, D., Henze, K., Embley, T. and Muller, M. 2002. Fructose-1,6-bisphosphate aldolases in amitochondriate protists constitute a single protein subfamily with eubacterial relationships. Gene, 295(1):51–9.

    Searcy, D.G., Stein, D.B. and Green, G.R. 1978. Phylogenetic affi nities between eukaryotic cells and a thermophilic mycoplasma. Biosystems, 10(1–2):19–28.

    Snel, B., Bork, P. and Huynen, M.A. 2002. Genomes in fl ux: the evolution of archaeal and proteobacterial gene content. Genome Res., 12(1):17–25.

    Stanhope, M.J., Lupas, A., Italia, M.J., Koretke, K.K., Volker, C. and Brown, J.R. 2001. Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature, 411(6840):940–4.

    Swofford, D.L. PAUP*. Phylogenetic Anlysis Using Parsimony (*and other methods). In., 4.0 edn. Sunderland, Massachusetts: Sinauer Associ-ates; 2003.

    Teichmann, S.A. and Mitchison, G. 1999. Is there a phylogenetic signal in prokaryote proteins? J. Mol. Evol., 49(1):98–107.

    Wernegreen, J.J., Ochman, H., Jones, I.B. and Moran, N.A. 2000. Decou-pling of genome size and sequence divergence in a symbiotic bacte-rium. J. Bacteriol., 182(13):3867–9.

    Woese, C.R. 2002. On the evolution of cells. Proc. Natl. Acad. Sci., U.S.A. 99(13):8742–7.

    Zillig, W., Klenk, H.P., Palm, P., Puhler, G., Gropp, F., Garrett, R.A. and Leffers, H. 1989. The phylogenetic relations of DNA-dependent RNA polymerases of archaebacteria, eukaryotes, and eubacteria. Can J. Microbiol., 35(1):73–80.

    Zillig, W., Palm, P. and Klenk, H-P. 1992. A model of the early evolution of organisms: the arisal of the three domains of life from the common ancestor. In: The Origin and Evolution of the Cell:163–82.

  • 119

    Examining Ancient Inter-domain Horizontal Gene Examining Ancient Inter-domain Horizontal Gene TransferFrancisca C. Almeida, Magdalena Leszczyniecka, Paul B. Fisher and Rob DeSalle

    Supplemental Material

    File 1—The node height testThis fi le contains a fi gure detailing the steps involved in the node height test. a) The fi rst step concerns a rate test where substitution rates are examined and their similarity determined. b) If rates are not similar across the tree or there are heterogeneous substitution rates among domains the node height test should not be used; c) If (E1-B1) � (B1-B1) then lineage extinction in Archaea can be inferred. If (A1-B1) � (B1-B1), then lineage extinction in Eukarya can be inferred. d) If (E1-B1) ≈ (B1-B1) then ancient IDHGT can be inferred between Eukarya and Bacteria. If (A1-B1) ≈ (B1-B1) then ancient IDHGT can be inferred between Archaea and Bacteria. The bottom of the fi gure shows a Flow chart outlining the steps in the Node Height Test.

    File 2—List of genes examined in studyThis fi le contains the list of genes from the E. coli genome that have paralogs with 40% similarity elsewhere in the genome. Column 1 indicates the distribution of the gene in other organisms where 1 = bacteria only; 2 = all three domains; 3 = Bacteria and Archaea 4 = Bacteria and Eukarya; 5 = No infer-ence. Column 2 shows the gene name, column 3 shows the most similar paralog, column 4 shows the E value for the comparison of paralogs, column 5, and column 6 indicate simillarities.

    File 3—Cladograms that support inter domain HGTThis fi le contains cladograms obtained for genes with phylogenetic evidence of IDHGT. Trees are strict consensus cladograms and numbers shown are bootstrap values above 75 (500 reps). Bootstrap values in bold font indicate possible HGT events. (a) 1-Deoxyxylulose-5-phosphate synthase (dxs, COG3959/3958), paralog Transketolase 2 isozyme; (b) Fructose-bisphosphate aldolase, class II (fba, COG0191), paralog Tagatose-bisphosphate aldolase 1; (c) Dipetide transport protein (dppa, COG0747), paralog Putative transport periplasmic protein; (d) ABC-type FE3+ -siderophore transport system, permease component (HemU, COG0609), paralog ABC-type FE3+ -siderophore transport system, ATP-binding; (e) Ferrous iron transport protein B (feoB, COG0370), paralog GTP-binding protein; (f) Formate dehydrogenase (fdhF, COG0243), paralog Nitrate reductase 1; (g) Glucose-1-phosphate thy-midylyltransferase (rmla, COG1213), paralog Glucose-1-phosphate adenylyltransferase; (h) Adenine deaminase adeC, COG1001), paralog Putative N-acetylgalactosamine-6-phosphate deacetylase. To facilitate viewing the fi gures we have colored taxa from the three major domains as follows: Bacteria target genes (B1) are colored red; Bacteria paralog genes (B2) are colored orange; Archaea target genes A1) are colored green; Archaea paralog genes (A2) are colored lime; Eukarya target genes (E1) are colored blue; Eukarya paralog genes (E2) are colored aqua.

    Examining ancient inter-domain horizontal gene transfer

    Evolutionary Bioinformatics 2008:4

  • 1

    1

    Supplemental Material File 1.

    Trees explaining Node Height Test

    c) Lineage extinction (in Archaea)

    B1 B1 E1 B2

    (E1-B1) > (B1-B1)

    d) HGT between Bacteria and Eukarya

    B1 B1E1 B2

    (E1-B1) ≈ (B1-B1)

    B1 B1 E1 B2

    a) Rate test: are substitution ratessimilar?

    (B1-B2) ≈ (E1-B2)

    B1 B1

    E1

    B2

    b) Heterogeneous substitution ratesamong domains: node height testshould not be used

    (B1-B2) ≠ (E1-B2)

  • 2

    2

    Flow chart showing decision making process in Node Height Test

    (E1-B1) ≈ (B1-B1)

    Are substitution rates similar?

    YESNOWhat if I use only theconserved regions?

    YES NO

    Pair wise distance matrix

    B1-B1 distances were compared to E1-B1distances

    K saturated

    K saturated

    5271File AttachmentIDHGT_suppl material_file 1.pdf

    Sheet1

    typeCommon NameCommon NameP value% Similarity% Identity

    1(3R)-hydroxymyristol acyl carrier protein dehydratasebeta-hydroxydecanoyl thioester dehydrase, trans-2-decenoyl-ACP isomerase0.0005752.84552830.894308

    12,3-dihydroxybenzoate-AMP ligaseputative ligase/synthetase7.90E-5749.43396431.132076

    12-acyl-glycerophospho-ethanolamine acyltransferase; acyl-acyl-carrier protein synthetaseputative ligase/synthetase1.70E-2750.58823429.705883

    12-component transcriptional regulatorpleiotrophic regulation of anaerobic respiration: response regulator for nar, frd, dms and tor genes0.05150.51546524.742268

    15-keto-D-gluconate 5-reductaseputative oxidoreductase3.70E-2752.06611632.644627

    16-phosphogluconate dehydratasedihydroxyacid dehydratase2.50E-4651.77165231.102362

    1aconitate hydrase B3-isopropylmalate isomerase (dehydratase) subunit6.60E-1248.86363632.954544

    1acridine efflux pumpputative membrane protein0.01249.33333229.333334

    1acyl carrier protein phosphodiesteraseorf, hypothetical protein0.05842.74193624.193548

    1acyl-CoA synthetase, long-chain-fatty-acid--CoA ligaseputative ligase/synthetase4.40E-4951.24481231.120333

    1ADP-heptose--lps heptosyltransferase II; lipopolysaccharide core biosynthesisheptosyl transferase I; lipopolysaccharide core biosynthesis2.60E-0848.1060628.787878

    1aerobic respiration sensor-response protein; histidine protein kinase/phosphatase, sensor for arcAsensor-regulator, activates OmpR by phophorylation6.90E-3855.9040634.686348

    1aerotaxis sensor receptor, flavoproteinmethyl-accepting chemotaxis protein IV, peptide sensor receptor3.10E-3962.41610739.932884

    1affects pool of 3'-phosphoadenosine-5'-phosphosulfate in pathway of sulfite synthesisenhances synthesis of sigma32 in mutant; extragenic suppressor, may modulate RNAse III lethal action1.70E-1547.3684230.263159

    1alpha-galactosidasephospho-beta-glucosidase; cryptic8.30E-3048.4162926.470589

    1AMP nucleosidaseorf, hypothetical protein0.06753.9215731.372549

    1anaerobic dicarboxylate transportanaerobic dicarboxylate transport1.90E-5362.92906236.155605

    1anaerobically inducible L-threonine, L-serine permeaseputative transport system permease protein2.90E-2149.39759126.26506

    1arginine 3rd transport system periplasmic binding proteinperiplasmic glutamine-binding protein; permease2.20E-2955.04201534.453781

    1arsenate reductaseputative oxidoreductase1.20E-1962.06896641.379311

    1ascBF operon repressorL-idonate transcriptional regulator2.40E-2552.26480926.480837

    1aspartokinase II and homoserine dehydrogenase IIaspartokinase I, homoserine dehydrogenase I1.70E-8454.91803435.109291

    1ATP-binding component of a membrane-associated complex involved in cell divisionputative ATP-binding component of a transport system1.60E-3564.01869243.925232

    1ATP-binding component of a transport systemputative ATP-binding component of a transport system1.90E-4653.16239227.179487

    1ATP-binding component of D-ribose high-affinity transport systemATP-binding component of high-affinity L-arabinose transport system6.80E-10667.74193642.741936

    1ATP-binding component of high-affinity L-arabinose transport systemputative ATP-binding component of a transport system4.90E-2556.7307736.53846

    1ATP-binding component of histidine transportATP-binding component of transport system for glycine, betaine and proline3.90E-3264.63414836.178864

    1ATP-binding component of hydroxymate-dependent iron transportATP-binding component of a transporter8.90E-2654.66666832.444443

    1ATP-binding component of leucine transportATP-binding protein of glutamate/aspartate transport system5.60E-2454.46428730.803572

    1ATP-binding component of methyl-galactoside transport and galactose taxisputative ATP-binding component of a transport system2.90E-5958.13008134.552845

    1ATP-binding component of sn-glycerol 3-phosphate transport systemputative ATP-binding component of a transport system1.10E-6164.61538740.923077

    1ATP-binding component of transport system for maltoseputative ATP-binding component of a transport system1.30E-5763.90728440.39735

    1ATP-binding protein of nickel transport systemputative ATP-binding component of a transport system1.50E-3254.54545635.123966

    1ATP-dependent serine activating enzyme (may be part of enterobactin synthase as component F)2,3-dihydroxybenzoate-AMP ligase8.40E-1548.10126529.535866

    1beta-D-galactosidasebeta-D-glucuronidase3.60E-1247.66584831.203932

    1biodegradative arginine decarboxylaseornithine decarboxylase isozyme, inducible4.50E-8654.05405433.068363

    1biotin biosynthesis; reaction prior to pimeloyl CoAputative enzyme4.70E-085030.625

    1biotin sulfoxide reductasenitrate reductase 1, alpha subunit4.70E-1346.88644826.007326

    1biotin sulfoxide reductase 2probable nitrate reductase 31.80E-0545.5429527.06645

    1carbamoyl-phosphate synthetase, glutamine (small) subunitp-aminobenzoate synthetase, component II1.10E-0649.65034930.76923

    1cardiolipin synthase, a major membrane phospholipid; novobiocin sensitivityputative synthetase1.00E-2250.15673830.407524

    1catabolic regulation response regulatortranscriptional regulator in 2-component system5.10E-3054.29864138.914028

    1catabolite repression sensor kinase for PhoB; alternative sensor for pho regulonputative 2-component sensor protein1.50E-1748.31730728.365385

    1cell division membrane proteinGTP-binding export factor binds to signal sequence, GTP and RNA4.50E-3858.84476533.935017

    1cell elongation, e phase; peptidoglycan synthetase; penicillin-binding protein 2septum formation; penicillin-binding protein 3; peptidoglycan synthetase3.90E-1752.27817929.736212

    1chromosome replication; initiation and chain elongationputative DNA replication factor1.60E-5872.72727250.413223

    1citrate-dependent iron(III) transport protein, cytosolichydroxamate-dependent iron uptake, cytoplasmic membrane component1.80E-2354.34782834.782608

    1cold shock protein 7.4, transcriptional activator of hnscold shock protein; may affect transcription9.90E-2792.53731583.582092

    1cold shock protein; may affect transcriptioncold shock-like protein7.70E-116045.714287

    1curved DNA-binding protein; functions closely related to DnaJchaperone with DnaK; heat shock protein1.00E-4157.43243440.87838

    1cyclic AMP receptor proteintranscriptional regulation of aerobic, anaerobic respiration, osmotic balance1.20E-0848.88888922.777779

    1cytochrome b(561)putative cytochrome3.40E-2462.22222142.222221

    1cytochrome c-type proteinputative cytochrome C-type protein3.20E-3765.1428644.57143

    1cytoplasmic trehalasetrehalase, periplasmic7.80E-13768.68686750.909092

    1D-alanyl-D-alanine carboxypeptidase; penicillin-binding protein 6penicillin-binding protein 71.30E-0948.24902729.96109

    1damage-inducible protein Jnegative regulator of translation0.004262.79069937.209301

    1D-amino acid dehydrogenase subunitflavoprotein; probably electron transport0.002157.94392435.514019

    1D-erythro-7,8-dihydroneopterin tri P epimeraseputative kinase2.40E-0954.95495625.225225

    1detox proteinorf, hypothetical protein6.30E-0968.67469848.192772

    1diadenosine tetraphosphataseprotein phosphatase 1 modulates phosphoproteins, signals protein misfolding1.50E-0554.66666838.666668

    1dipeptide transport system permease protein 2homolog of Salmonella oligopeptide transport permease protein6.50E-2458.90909232.727272

    1DNA biosynthesis; initiation of chromosome replication; can be transcription regulatorputative DNA replication factor4.40E-1151.36363629.09091

    1DNA biosynthesis; primosomal protein iorf, hypothetical protein0.01755.55555741.666668

    1DNA helicase, ATP-dependent dsDNA/ssDNA exonuclease V subunit, ssDNA endonucleaseDNA-dependent ATPase I and helicase II0.07145.83333228.571428

    1DNA polymerase III, delta prime subunitDNA polymerase III, tau and gamma subunits; DNA elongation factor III7.60E-1050.57471132.567051

    1DNA repair proteinputative DNA repair protein, RADC family2.50E-2873.33333648.333332

    1DNA topoisomerase IIIDNA topoisomerase type I, omega protein3.30E-2048.77505527.171492

    1DNA topoisomerase IV subunit ADNA gyrase, subunit A, type II topoisomerase4.80E-10361.57205239.15575

    1DNA topoisomerase IV subunit BDNA gyrase subunit B, type II topoisomerase, ATPase activity4.70E-10761.81533841.158058

    1DNA-binding protein HLP-II (HU, BH2, HD, NS); pleiotropic regulatorDNA-binding protein; H-NS-like protein; chaperone activity; RNA splicing?1.50E-2469.62963157.777779

    1DNA-binding protein HU-beta, NS1 (HU-1)integration host factor (IHF), alpha subunit; site specific recombination3.80E-0957.77777932.222221

    1DNA-binding protein; H-NS-like protein; chaperone activity; RNA splicing?DNA-binding protein HLP-II (HU, BH2, HD, NS); pleiotropic regulator3.60E-2969.62963157.777779

    1dnaK suppressor proteinorf, hypothetical protein0.01848.83720834.88372

    1D-ribose high-affinity transport systemputative transport system permease protein2.70E-1751.01351229.054054

    1D-ribose periplasmic binding proteinxylose binding protein transport system4.30E-1344.982724.567474

    1D-serine dehydratase (deaminase) transcriptional activatorputative transcriptional regulator LYSR-type7.00E-1749.50495130.198019

    1enables flagellar motor rotation, linking torque machinery to cell wallputative outer membrane protein0.07249.54128327.522936

    1endonuclease VIII and DNA N-glycosylase with an AP lyase activityformamidopyrimidine DNA glycosylase1.10E-0543.93939223.863636

    1envelope protein; thermoregulation of porin biosynthesisputative ARAC-type regulatory protein0.03451.02040930.612246

    1enzyme that may degrade or block biosynthesis of endogenous mal inducer, probably aminotrasferasemulti modular; putative transcriptional regulator; also putative ATP-binding component of a transport system1.10E-0942.95082123.606558

    1ethanolamine utilization; homolog of Salmonella acetyl/butyryl P transferaseputative multimodular enzyme1.30E-3654.687533.75

    1evolved beta-D-galactosidase, alpha subunit; cryptic genebeta-D-galactosidase1.10E-14656.34095836.590435

    1facilitated diffusion of glyceroltransmembrane water channel; aquaporin Z9.00E-0850.87719331.578947

    1ferredoxin-NADP reductaseputative oxidoreductase7.90E-0744.06779524.152542

    1ferric enterobactin (enterochelin) transportputative transport protein2.50E-1445.01510623.564955

    1fimbrial morphologyputative fimbrial-like protein0.007441.66666822.777779

    1fimbrial proteinputative fimbrial-like protein4.40E-0542.2077927.922077

    1fimbrial Z protein; probable signal transducerpositive response regulator for colanic capsule biosynthesis, (sensor, RcsC)5.10E-1447.29064227.586206

    1flagellar biosynthesisflagellar biosynthesis; possible export of flagellar proteins1.10E-8063.12056741.312057

    1flagellar biosynthesis, cell-distal portion of basal-body rodflagellar biosynthesis, cell-proximal portion of basal-body rod3.50E-1545.60669328.870293

    1flagellar biosynthesis, cell-proximal portion of basal-body rodflagellar biosynthesis, cell-distal portion of basal-body rod2.60E-1948.89867833.480175

    1flagellar biosynthesis, hook proteinflagellar biosynthesis, cell-proximal portion of basal-body rod0.04550.40650632.520325

    1flagellar biosynthesis, hook-filament junction protein 1flagellar biosynthesis; flagellin, filament structural protein7.80E-0549.18032825.409836

    1flagellar biosynthesis; alternative sigma factor 28; regulation of flagellar operonsRNA polymerase, sigma(32) factor; regulation of proteins induced at high temperatures1.10E-0650.529

    1flagellar biosynthesis; flagellin, filament structural proteinflagellar biosynthesis; hook-filament junction protein0.03147.27891225.170069

    1flagellar biosynthesis; hook-filament junction proteinflagellar biosynthesis; flagellin, filament structural protein0.01946.83098625.704226

    1flagellar biosynthesis; possible export of flagellar proteinsflagellar biosynthesis4.90E-8063.12056740.957447

    1flagellar hook-length control proteinputative membrane protein; interferes with cell division0.09841.11675327.918781

    1flagellum-specific ATP synthasemembrane-bound ATP synthase, F1 sector, alpha-subunit8.60E-2351.79153129.641693

    1flavodoxin 2flavodoxin 18.40E-3766.66666443.636364

    1formate dehydrogenase-N, nitrate-inducible, cytochrome B556(Fdn) gamma subunitformate dehydrogenase, cytochrome B556 (FDO) subunit2.00E-4268.72038348.815166

    1formate hydrogen-lyase transcriptional activator for fdhF, hyc and hyp operonstranscriptional regulation of aroF, aroG, tyrA and aromatic amino acid transport1.20E-4658.23170936.890244

    1formate-dependent nitrite reductase; a penta-haeme cytochrome cperiplasmic cytochrome c(552): plays a role in nitrite reduction0.002950.7692333.846153

    1fructose-1-phosphate kinaseputative kinase7.20E-0645.72490728.252789

    1frv operon proteinputative lyase/synthase3.20E-3750.72046331.123919

    1fucose permeaseputative transport protein0.01349.5384626.153847

    1galactose-binding transport protein; receptor for galactose taxisputative LACI-type transcriptional regulator1.50E-1051.21951330.487804

    1galactoside permease (M protein)MFS (major facilitator superfamily) transporter0.01148.41772123.734177

    1Gef protein interferes with membrane function when in excesspolypeptide destructive to membrane potential2.40E-079282

    1glucitol (sorbitol)-6-phosphate dehydrogenaseNAD-dependent 7alpha-hydroxysteroid dehydrogenase, dehydroxylation of bile acids2.60E-1952.86885130.327869

    1gluconate kinase, thermosensitive glucokinasegluconokinase 2, thermoresistant3.40E-4074.05063647.468353

    1glucuronide permeaseputative permease5.90E-3649.89059128.008753

    1glutamate/aspartate transport system permeasehistidine transport system permease protein2.00E-1251.18483423.696682

    1glutamate-aspartate symport proteinputative transport protein2.30E-1051.31578826.052631

    1glutaredoxin-like protein; hydrogen donorglutaredoxin 30.005448.14814824.074074

    1glutathione synthetaseribosomal protein S6 modification protein0.0004746.63212629.533678

    1haemolysin expression modulating proteinorf, hypothetical protein3.00E-0968.57142638.57143

    1heat shock protein hslJbacteriophage lambda Bor protein homolog0.05363.15789450

    1heptosyl transferase I; lipopolysaccharide core biosynthesislipopolysaccharide core biosynthesis0.0006244.48669125.855513

    1hexose phosphate transport proteinputative transport protein0.04147.93650826.984127

    1high-affinity amino acid transport system; periplasmic binding proteinhigh-affinity leucine-specific transport system; periplasmic binding protein3.20E-14786.9565277.173912

    1high-affinity choline transportputative transport protein2.30E-4353.9503428.668171

    1high-affinity L-arabinose transport system; membrane protein, fragment 1methyl-galactoside transport and galactose taxis0.002650.90909231.363636

    1high-affinity leucine-specific transport system; periplasmic binding proteinhigh-affinity amino acid transport system; periplasmic binding protein1.30E-15286.9565277.173912

    1histidine protein kinase sensor for GlnG regulator (nitrogen regulator II, NRII)putative 2-component sensor protein1.40E-1250.68493332.876713

    1histidine transport system permease proteinhistidine transport, membrane protein M3.00E-1852.05479431.96347

    1histidine transport, membrane protein Mhistidine transport system permease protein1.90E-1652.05479431.96347

    1Hnr proteinnegative response regulator of genes in aerobic pathways, (sensors, ArcB and CpxA)9.10E-0951.18110332.283466

    1Holliday junction helicase subunit A; branch migration; repairputative 2-component regulator0.03149.36708830.379747

    1homolog of Salmonella UTP--glucose-1-P uridyltransferase, probably a UDP-gal transferaseglucose-1-phosphate thymidylyltransferase3.30E-0754.34782831.15942

    1hydrogenase 4 Fe-S subunitformate dehydrogenase-O, iron-sulfur subunit0.01146.87535

    1hydrogenase-1 large subunitprobable large subunit, hydrogenase-22.80E-11664.71630944.680851

    1hydrogenase-2 operon protein: may effect maturation of large subunit of hydrogenase-2pleiotrophic effects on 3 hydrogenase isozymes2.30E-1878.20513256.410255

    1hyperosmotically inducible periplasmic proteinputative periplasmic protein5.10E-1450.30674730.674847

    1inducible ATP-independent RNA helicaseputative ATP-dependent RNA helicase1.10E-5255.77395638.329239

    1initiation of chromosome replicationsulfite reductase (NADPH), flavoprotein beta subunit0.000152.02702729.054054

    1inner membrane protein, membrane-spanning, maintains integrity of cell envelope; tolerance to group A colicinsuptake of enterochelin; tonB-dependent uptake of B colicins4.20E-1962.20930141.279068

    1integral transmembrane protein; acridine resistancesensitivity to acriflavine, integral membrane protein, possible efflux pump3.00E-29578.10077762.5

    1integration host factor (IHF), alpha subunit; site specific recombinationDNA-binding protein HU-alpha (HU-2)1.10E-1158.88888936.666668

    1inversion of adjacent DNA; at locus of e14 elementputative transposon resolvase1.70E-2455.86592138.547485

    1involved in fimbrial asemblyputative fimbrial protein0.007247.52475426.732674

    1involved in protein transport; multicopy suppressor of dominant negative ftsH mutantsputative oxidoreductase subunit3.40E-7758.58585740.202019

    1IS150 putative transposaseIS3 putative transposase8.30E-3050.55350530.996309

    1IS186 hypothetical proteinIS4 hypothetical protein0.06957.33333233.333332

    1IS2 hypothetical proteinorf, hypothetical protein0.008841.8502228.634361

    1IS3 putative transposaseputative transposase2.50E-1462.06896643.678162

    1IS30 transposaseIS30 transposase1.00E-19610099.738907

    1IS911 hypothetical protein, variant (IS911A)IS911 hypothetical protein (IS911B)4.10E-4495.91836593.877548

    1isoaspartyl dipeptidaseputative hydrolase0.007348.92473227.956989

    1ketodeoxygluconokinaseputative kinase7.60E-055336

    1L-arabinose-binding periplasmic proteinputative LACI-type transcriptional regulator5.00E-0543.35443125

    1large terminal subunit of phenylpropionate dioxygenaseorf, hypothetical protein4.90E-135028.767124

    1L-asparagine permeasetransport of lysine/cadaverine9.20E-1243.42105122.105263

    1leader peptidase, integral membrane proteinputative peptidase3.50E-146040.714287

    1L-idonate transcriptional regulatorascBF operon repressor2.40E-2551.21107125.951557

    1lipopolysaccharide core biosynthesisheptosyl transferase I; lipopolysaccharide core biosynthesis0.0006245.66037826.415094

    1lipopolysaccharide core biosynthesis; phosphorylation of core heptose; attaches phosphate-containing substrate to LPS copH-inducible protein involved in stress response1.70E-0549.46236432.258064

    1lipoprotein-28putative lipoprotein4.80E-7381.20301162.030075

    1low affinity tryptophan permeaseputative transporter protein0.02244.29530322.147652

    1L-ribulose-5-phosphate 4-epimeraseputative epimerase/aldolase0.06754.21686937.349396

    1L-serine deaminaseputative L-serine dehydratase7.40E-3870.67668960.902256

    1lysine decarboxylase 2, constitutiveornithine decarboxylase isozyme, inducible5.30E-7050.94339833.176102

    1lysine-, arginine-, ornithine-binding periplasmic proteinarginine 3rd transport system periplasmic binding protein9.40E-3854.95867933.884296

    1major type 1 subunit fimbrin (pilin)putative fimbrial-like protein6.80E-0743.64640826.519337

    1maltodextrin glucosidasetrehalase 6-P hydrolase2.10E-1650.33333234

    1maltodextrin phosphorylaseglycogen phosphorylase1.10E-17668.06832948.094612

    1may modulate levels of hydrogenease-2pleiotrophic effects on 3 hydrogenase isozymes7.50E-2971.92982548.245613

    1melibiose permease IIputative permease1.20E-3549.53488526.976744

    1membrane-bound lytic murein transglycosylase Csoluble lytic murein transglycosylase0.0003848.7530

    1membrane-spanning protein of hydrogenase 3 (part of FHL complex)hydrogenase 4 membrane subunit2.10E-0849.65034926.340326

    1methyl-accepting chemotaxis protein I, serine sensor receptoraerotaxis sensor receptor, flavoprotein2.40E-3266.56441541.717793

    1methyl-accepting chemotaxis protein II, aspartate sensor receptormethyl-accepting chemotaxis protein III, ribose sensor receptor4.50E-7064.40677645.386063

    1methyl-accepting chemotaxis protein III, ribose sensor receptormethyl-accepting chemotaxis protein I, serine sensor receptor6.50E-6268.86792850.943398

    1methyl-accepting chemotaxis protein IV, peptide sensor receptormethyl-accepting chemotaxis protein II, aspartate sensor receptor3.80E-8964.8543747.961166

    1methyl-galactoside transport and galactose taxishigh-affinity L-arabinose transport system; membrane protein, fragment 10.0008250.51020432.142857

    1mgl repressor, galactose operon inducerrepressor of malX and Y genes7.20E-2445.95469328.15534

    1minor curlin subunit precursor, similar ro CsgAcurlin major subunit, coiled surface structures; cryptic2.80E-0643.01075431.182796

    1minor fimbrial subunit, D-mannose specific adhesinputative adhesin; similar to FimH protein3.50E-7065.88628447.491638

    1molybdate metabolism regulator, first fragmentorf, hypothetical protein0.0001165.71428745.714287

    1molybdate metabolism regulator, second fragment 2putative regulator6.30E-9759.37984540.155037

    1molybdate metabolism regulator, third fragmentputative regulator7.10E-4061.42322244.56929

    1multidrug resistance protein Kmultidrug resistance secretion protein3.00E-8968.35106749.734043

    1multidrug resistance protein Ymultidrug resistance; probably membrane translocase1.20E-14980.83832663.073853

    1multidrug resistance secretion proteinmultidrug resistance protein K1.30E-8868.35106749.734043

    1multiple antibiotic resistance; transcriptional activator of defense systemsregulatory protein for 2-phenylethylamine catabolism7.90E-065329

    1murein lipoproteinorf, hypothetical protein1.40E-1374.62686950.746269

    1murein transglycosylase Emembrane-bound lytic murein transglycosylase C6.50E-3064.45783240.361446

    1N-acetylmuramoyl-l-alanine amidase II; a murein hydrolaseputative amidase2.60E-3957.96703335.43956

    1negative regulator of exu regulon, exuT, uxaAC, and uxuBputative FADA-type transcriptional regulator3.60E-0746.92982525.438597

    1negative regulator of translationdamage-inducible protein J0.004262.79069937.209301

    1negative response regulator of genes in aerobic pathways, (sensors, ArcB and CpxA)transcriptional response regulatory protein (sensor BaeS)8.60E-2853.50877432.456139

    1negative transcriptional regulator of cel operonpositive regulator for rhaBAD operon2.80E-0746.53061324.081633

    1N-ethylmaleimide reductaseputative NADPH dehydrogenase2.90E-1550.60240931.927711

    1nitrate reductase 1, delta subunit, assembly functioncryptic nitrate reductase 2, delta subunit, assembly function1.30E-3873.45132457.079647

    1nitrate/nitrate sensor, histidine protein kinase acts on NarL regulatorsensor histidine protein kinase phosphorylates UhpA9.40E-1251.4285728.571428

    1nitrate/nitrite response regulator (sensor NarQ)putative 2-component transcriptional regulator1.30E-1053.33333230.370371

    1nitrite extrusion protein 2nitrite extrusion protein1.40E-15588.76651875.770927

    1non-essential phosphatidylglycerophosphate phosphatase, membrane boundorf, hypothetical protein0.06148.71794929.487179

    1oligopeptide transport permease proteinputative transport system permease protein8.50E-3359.53947433.552631

    1oligopeptide transport; periplasmic binding proteindipeptide transport protein5.80E-2843.44978325.327511

    1ornithine decarboxylase isozymelysine decarboxylase 2, constitutive2.60E-8054.32937236.162987

    1ornithine decarboxylase isozyme, induciblelysine decarboxylase 14.10E-7651.50501635.284283

    1outer membrane channel; specific tolerance to colicin E1; segregation of daughter chromosomesputative resistance protein2.00E-0647.71084224.578314

    1outer membrane fluffing protein, similar to adhesinputative ATP-binding component of a transport system8.00E-3953.18627536.887257

    1outer membrane pore protein E (E,Ic,NmpAB)putative outer membrane protein2.90E-5273.9130460.869564

    1outer membrane porin protein; locus of qsr prophageouter membrane protein 1b (Ib;c)3.50E-9575.65982164.809387

    1outer membrane protein 1a (Ia;b;F)putative outer membrane porin protein2.10E-2470.83333656.25

    1outer membrane protein 1b (Ib;c)putative outer membrane porin protein1.80E-2080.4597762.068966

    1outer membrane protein 3a (II*;G;d)putative outer membrane protein3.70E-1455.75221333.628319

    1outer membrane protein receptor for ferrichrome, colicin M, and phages T1, T5, and phi80outer membrane receptor; citrate-dependent iron transport, outer membrane receptor1.40E-0845.49266425.366877

    1outer membrane protein Xorf, hypothetical protein0.0001265.21739243.47826

    1outer membrane protein; export and assembly of type 1 fimbriae, interruptedputative outer membrane protein5.20E-1047.90697932.55814

    1outer membrane receptor for ferric enterobactin (enterochelin) and colicins B and Dputative outer membrane receptor for iron transport5.70E-0645.89147226.046511

    1outer membrane receptor for ferric iron uptakeouter membrane receptor; citrate-dependent iron transport, outer membrane receptor0.04744.969225.462013

    1outer membrane receptor for iron-regulated colicin I receptor; porin; requires tonB gene productouter membrane receptor; citrate-dependent iron transport, outer membrane receptor1.70E-1047.75784728.026905

    1outer membrane receptor for transport of vitamin B12, E colicins, and bacteriophage BF23putative outer membrane receptor for iron transport2.00E-1148.22888231.60763

    1outer membrane receptor; citrate-dependent iron transport, outer membrane receptorouter membrane receptor for iron-regulated colicin I receptor; porin; requires tonB gene product1.60E-1047.10578927.34531

    1p-aminobenzoate synthetase, component Ianthranilate synthase component I2.00E-3555.4285732.857143

    1paraquat-inducible protein Aorf, hypothetical protein1.00E-3851.79487230.256411

    1paraquat-inducible protein Borf, hypothetical protein1.30E-3756.2300332.58786

    1part of formate-dependent nitrite reductase complexpossible subunit of heme lyase2.30E-1766.66666439.024391

    1part of maltose permease, periplasmicsn-glycerol 3-phosphate transport system, integral membrane protein1.10E-1755.60538133.632286

    1part of regulation of tor operon, periplasmicputative LACI-type transcriptional regulator0.01249.58677728.925619

    1penicillin binding protein 6bpenicillin-binding protein 78.00E-1147.41035828.286852

    1penicillin-binding protein 7D-alanyl-D-alanine carboxypeptidase; penicillin-binding protein 65.50E-1449.03474830.115829

    1peptidoglycan synthetase; penicillin-binding protein 1Aputative peptidoglycan enzyme1.90E-4051.11875933.562824

    1peptidoglycan-associated lipoproteinouter membrane protein 3a (II*;G;d)2.90E-0651.51515230.30303

    1periplasmic binding protein for nickelputative transport protein1.60E-3148.03921529.656862

    1periplasmic chaperone, required for type 1 fimbriaeputative membrane protein2.50E-2154.67980233.004925

    1periplasmic cytochrome c(552): plays a role in nitrite reductionformate-dependent nitrite reductase; a penta-haeme cytochrome c0.002948.52941132.35294

    1periplasmic glucans biosynthesis proteinputative glycoprotein2.10E-8155.91836538.57143

    1periplasmic glucose-1-phosphatasephosphoanhydride phosphorylase; pH 2.5 acid phosphatase; periplasmic4.20E-5151.68269332.932693

    1periplasmic protein related to spheroblast formationorf, hypothetical protein1.10E-1163.26530531.632652

    1periplasmic putrescine-binding protein; permease proteinspermidine/putrescine periplasmic transport protein7.60E-3656.92307736.615383

    1periplasmic serine protease Do; heat shock protein HtrAorf, hypothetical protein0.02957.535

    1periplasmic sulfate-binding proteinthiosulfate binding protein7.70E-7566.97819546.417446

    1persistence to inhibition of murein or DNA biosynthesis; regulatory proteinorf, hypothetical protein2.20E-0655.40540731.081081

    1phage lambda receptor protein; maltose high-affinity receptorputative receptor protein1.10E-1846.92874528.255527

    1phenylacetaldehyde dehydrogenasealdehyde dehydrogenase, NAD-linked1.90E-6453.91120532.980972

    1pH-inducible protein involved in stress responselipopolysaccharide core biosynthesis; phosphorylation of core heptose; attaches phosphate-containing substrate to LPS co2.00E-0549.46236433.333332

    1PhoB-dependent, ATP-binding pho regulon component; may be helicase; induced by P starvationputative ATP-binding protein in pho regulon6.00E-4363.90243943.902439

    1phosphoanhydride phosphorylase; pH 2.5 acid phosphatase; periplasmicperiplasmic glucose-1-phosphatase4.90E-4853.14009534.299519

    1phospho-beta-glucosidase B; cryptic6-phospho-beta-glucosidase A; cryptic3.20E-12466.30901351.931332

    1phosphocarrier protein HPr-like NPr, nitrogen related, exchanges phosphate with Enzyme I, HprPTS system, fructose-specific IIA/fpr component1.20E-0556.09756130.487804

    1phosphoglucomutasesimilar to phosphoglucomutases and phosphomannomutases0.0001345.14285732.57143

    1phosphomannomutasesimilar to phosphoglucomutases and phosphomannomutases1.20E-2348.31168729.350649

    1phosphonate metabolismputative hydrolase0.09847.92899331.952663

    1phosphopentomutaseputative mutase1.40E-2551.25447836.917564

    1phosphotransferase system enzyme IIA, regulates N metabolismPTS system, mannitol-specific enzyme IIABC components3.80E-0752.42718527.184465

    1pleiotrophic regulation of anaerobic respiration: response regulator for nar, frd, dms and tor genesputative 2-component transcriptional regulator1.70E-1554.04040527.777779

    1polypeptide destructive to membrane potentialGef protein interferes with membrane function when in excess2.70E-179282

    1positive and negative sensor protein for pho regulonprobable sensor protein (histidine protein kinase), acting on arcA5.20E-2354.1850230.396475

    1positive regulator for ctr capsule biosynthesis, positive transcription factorputative 2-component transcriptional regulator0.08254.23728932.203388

    1positive regulator for ilvCtranscriptional regulator cys regulon; accessory regulatory circuit affecting cysM5.10E-1447.3684228.34008

    1positive regulator for rhaBAD operonregulatory protein for 2-phenylethylamine catabolism1.60E-0650.56179826.404495

    1positive regulator for rhaRS operonputative ARAC-type regulatory protein6.00E-1149.78723525.106382

    1positive regulator of gcv operonD-serine dehydratase (deaminase) transcriptional activator3.00E-3250.70922131.914894

    1positive regulator of mal regulonputative 2-component transcriptional regulator for 2nd curli operon9.80E-0554.96688833.774834

    1positive regulator of the fuc operonputative DEOR-type transcriptional regulator3.20E-215028.947369

    1positive response regulator for colanic capsule biosynthesis, (sensor, RcsC)putative positive transcription regulator (sensor EvgS)4.00E-1451.25628325.628141

    1positive response regulator for pho regulon, sensor is PhoR (or CreC)regulator of kdp operon (transcriptional effector)2.90E-3458.82352837.104073

    1positive transcriptional regulator for cysteine regulonregulator for metE and metH1.50E-1350.27933127.374302

    1possible NAGC-like transcriptional regulatorputative NAGC-like transcriptional regulator7.10E-0850.73170929.756098

    1possible synthesis of cofactor for carnitine racemase and dehydrataseputative transferase4.10E-6077.20207259.067356

    1primosomal protein N'(= factor Y)(putative helicase)host restriction; endonuclease R0.05242.85714324.829931

    1probable crotonobetaine/carnitine-CoA ligaseacyl-CoA synthetase, long-chain-fatty-acid--CoA ligase3.20E-4254.32432633.783783

    1probable formate transporter (formate channel 2)probable formate transporter (formate channel 1)7.60E-6867.02898451.086956

    1probable growth inhibitor, PemK-like, autoregulatedprobable growth inhibitor, PemK-like, autoregulated8.00E-0954.36893138.834953

    1probable iron-sulfur protein of hydrogenase 3 (part of FHL complex)putative oxidoreductase, Fe-S subunit0.00475033.898304

    1probable large subunit, hydrogenase-2hydrogenase-1 large subunit2.80E-11664.53900944.858154

    1probable nitrate reductase 3anaerobic dimethyl sulfoxide reductase subunit A2.80E-1346.28378328.040541

    1probable outer membrane porin protein involved in fimbrial assemblyputative outer membrane protein2.70E-0947.86729826.540285

    1probable pilin chaperone similar to PapDputative membrane protein1.60E-175028.947369

    1probable RNA polymerase sigma factorRNA polymerase, sigma-E factor; heat shock and oxidative stress0.07846.03174627.777779

    1probable sensor protein (histidine protein kinase), acting on arcAputative 2-component sensor protein3.00E-1351.40187130.529594

    1probable serine transporterputative transport system permease protein3.90E-3252.81173728.606358

    1probable sigma-54 modulation proteinputative yhbH sigma 54 modulator7.20E-0862.63736339.56044

    1probable small subunit of hydrogenase-3, iron-sulfur protein (part of formate hydrogenlyase (FHL) complex)formate dehydrogenase-O, iron-sulfur subunit6.20E-1044.21768634.013607

    1probable transcriptional activator for leuABCD operonputative transcriptional regulator LYSR-type0.0002642.41245318.677042

    1processing of HyaA and HyaB proteinsprobable processing element for hydrogenase-22.00E-2157.14285739.751553

    1processing of large subunit (HycE) of hydrogenase 3 (part of the FHL complex)putative protein processing element2.90E-2764.12213948.091602

    1proline aminopeptidase P IIproline dipeptidase1.30E-2851.36186635.408562

    1proline permease transport proteinputative amino acid/amine transport protein1.00E-1045.38043622.826086

    1prophage CP4-57 integraseputative phage integrase0.03642.62295223.934425

    1prophage DLP12 integrasesite-specific recombinase0.0001748.60557928.685259

    1prophage P4 integraseputative prophage integrase4.20E-5159.83606735.792351

    1protein disulfide isomerase IIthiol:disulfide interchange protein3.90E-055031.904762

    1protein export; molecular chaperone; may bind to signel sequencenitrogen regulatory protein P-II 20.03461.22449130.612246

    1protein histidine kinase/phosphatase sensor for OmpR, modulates expression of ompF and ompCputative 2-component sensor protein5.90E-1248.5714325.357143

    1protein modification enzyme, induction of ompCPTS system fructose-like IIB component 14.40E-1665.95744343.61702

    1protein phosphatase 2diadenosine tetraphosphatase1.00E-0555.71428741.42857

    1psp operon transcriptional activatortranscriptional regulation of aroF, aroG, tyrA and aromatic amino acid transport2.80E-3856.50557737.174721

    1PTS enzyme IIAB, mannose-specificPTS system, cytoplasmic, N-acetylgalactosamine-specific IIB component 1 (EIIB-AGA)2.90E-2055.19480533.766235

    1PTS enzyme IIC, mannose-specificPTS system N-acetylgalactosameine-specific IIC component 22.50E-1253.48837331.782946

    1PTS enzyme IID, mannose-specificPTS system, N-acetylglucosamine enzyme IID component 14.60E-2960.52631834.586468

    1PTS system beta-glucosides, enzyme II, crypticlow-affinity L-arabinose transport system proton symport protein0.01346.47887425.352112

    1PTS system enzyme II ABC (asc), cryptic, transports specific beta-glucosidesPTS system enzyme II, trehalose specific2.30E-2050.56433528.668171

    1PTS system enzyme II, trehalose specificPTS system beta-glucosides, enzyme II, cryptic3.00E-3351.43540626.076555

    1PTS system fructose-like IIB component 1putative PTS system enzyme IIB component2.70E-1769.60784145.098038

    1PTS system fructose-like IIB component 2PTS system fructose-like IIB component 14.30E-1759.22330142.718445

    1PTS system galactitol-specific enzyme IICputative PTS system enzyme IIC component1.80E-5762.87702942.227379

    1PTS system N-acetylgalactosameine-specific IIC component 2PTS system N-acetylgalactosamine-specific IIC component 10.03463.70967938.709679

    1PTS system protein HPrputative PTS system enzyme IIA component, enzyme I0.004857.812534.375

    1PTS system, arbutin-like IIB componentPTS system, N-acetylglucosamine-specific enzyme IIABC2.80E-0955.94405734.965034

    1PTS system, arbutin-like IIC componentPTS system, maltose and glucose-specific II ABC4.20E-4252.83018935.040432

    1PTS system, cytoplasmic, N-acetylgalactosamine-specific IIB component 1 (EIIB-AGA)PTS system, cytoplasmic, N-acetylgalactosamine-specific IIB component 2 (EIIB-AGA)9.60E-2967.56756647.297298

    1PTS system, fructose-like enzyme II componentPTS system, fructose-like enzyme IIBC component1.40E-3459.79021137.762238

    1PTS system, fructose-like enzyme IIBC componentputative transport protein6.80E-1151.13636428.40909

    1PTS system, fructose-specific IIA componentphosphotransferase system enzyme IIA, regulates N metabolism1.90E-0751.51515227.272728

    1PTS system, fructose-specific transport proteinprotein modification enzyme, induction of ompC7.70E-4754.49438134.269665

    1PTS system, glucose-specific IIA componentPTS system beta-glucosides, enzyme II, cryptic4.80E-2561.64383741.09589

    1PTS system, glucose-specific IIBC componentPTS system, maltose and glucose-specific II ABC5.00E-6460.08403439.915966

    1PTS system, mannitol-specific enzyme II component, crypticputative PTS system enzyme II A component1.00E-176034.814816

    1PTS system, mannitol-specific enzyme IIABC componentsPTS system enzyme II ABC (asc), cryptic, transports specific beta-glucosides0.002644.84304830.493273

    1PTS system, mannitol-specific enzyme IIABC componentsPTS system, mannitol-specific enzyme II component, cryptic6.60E-10872.92576654.366814

    1PTS system, N-acetylglucosamine enzyme IID component 1PTS enzyme IID, mannose-specific6.30E-3260.90225634.962406

    1putative 2-component regulatorHolliday junction helicase subunit A; branch migration; repair0.03145.312528.125

    1putative 2-component regulator, interaction with sigma 54response regulator for gln (sensor glnL) (nitrogen regulator I, NRI)4.40E-5763.45608944.475922

    1putative 2-component transcriptional regulatorputative 2-component transcriptional regulator1.90E-5376.60550751.834862

    1putative 2-component transcriptional regulator for 2nd curli operonputative regulator1.20E-0967.8571451.785713

    1putative activating enzymeprobable pyruvate formate lyase activating enzyme 24.80E-0848.44444328

    1putative adhesin; similar to FimH proteinminor fimbrial subunit, D-mannose specific adhesin1.50E-6065.88628447.491638

    1putative adhesion and penetration proteinsplit orf0.0001843.33333226.666666

    1putative aldolasefructose-bisphosphate aldolase, class II9.00E-0752.19298230.263159

    1putative alpha helix proteinorf, hypothetical protein1.50E-1051.51515228.282827

    1putative an envelop proteinorf, hypothetical protein3.00E-5792.02898478.985504

    1putative ARAC-type regulatory proteinpositive regulator for rhaRS operon2.60E-0543.85245923.360655

    1putative ATP-binding component of a transport systemflagellar biosynthesis; flagellin, filament structural protein0.07745.10869628.804348

    1putative ATP-binding protein of peptide transport systemhomolog of Salmonella ATP-binding protein of oligopeptide ABC transport system1.40E-4157.23270435.220127

    1putative ATP-binding protein of xylose transport systemATP-binding component of D-ribose high-affinity transport system2.90E-11467.4603246.031746

    1putative channel/filament proteinsputative resistance protein3.80E-1845.34534527.627628

    1putative cytochrome C-type biogenesis proteinputative enzyme2.20E-2650.35629326.603325

    1putative cytochrome C-type proteintrimethylamine N-oxide reductase, cytochrome c-type subunit3.60E-5254.40729538.905777

    1putative cytochrome oxidaseprobable enzyme0.02253.78151332.773109

    1putative DEOR-type transcriptional regulatorsplit galactitol utilization operon repressor, interrupted0.007943.44827724.137932

    1putative DEOR-type transcriptional regulator of aga operonputative DEOR-type transcriptional regulator1.50E-2351.383424.505928

    1putative DMSO reductase anchor subunitanaerobic dimethyl sulfoxide reductase subunit C2.20E-7774.56446156.794426

    1putative DNA binding proteincurved DNA-binding protein; functions closely related to DnaJ7.90E-0567.21311242.622952

    1putative DNA ligaseorf, hypothetical protein6.70E-7677.82257858.870968

    1putative DNA replication factorchromosome replication; initiation and chain elongation3.80E-5772.26890650

    1putative enzyme of polynucleotide modificationorf, hypothetical protein3.60E-11668.26722750.521919

    1putative epimerase/aldolaseputative epimerase/aldolase0.0001650.28901728.901733

    1putative factorputative factor1.20E-4552.02156433.153637

    1putative FADA-type transcriptional regulatororf, hypothetical protein0.005658.7301631.746031

    1putative ferredoxinorf, hypothetical protein9.30E-3178.72340461.702129

    1putative filament proteinorf, hypothetical protein9.80E-0753.90071123.404255

    1putative fimbri