Mol Biol Evol-2004-Hao-1294-307

Embed Size (px)

Citation preview

  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    1/14

    Patterns of Bacterial Gene Movement

    Weilong Hao and G. B. Golding

    Department of Biology, McMaster University, Hamilton, Ontario, Canada

    Lateral gene transfer has emerged as an important force in bacterial evolution. A substantial number of genes can beinserted into or deleted from genomes through the process of lateral transfer. In this study, we looked for atypicaloccurrence of genes among related organisms to detect laterally transferred genes. We have analyzed 50 bacterial

    complete genomes from nine groups. For each group we use a 16s rRNA phylogeny and a comparison of proteinsimilarity to map gene insertions/deletions onto their species phylogeny. The results reveal that there is poor correlationof genes inserted, deleted, and duplicated with evolutionary branch length. In addition, the numbers of genes inserted,deleted, or duplicated within the same branch are not always correlated with each other. Nor is there any similarity withingroups. For example, in the Rhizobiales group, the ratio of insertions to deletions in the evolutionary branch leading toAgrobacterium tumefaciens str. C58 (Cereon) is 0.52, but it is 39.52 forMesorhizobium loti. Most strikingly, the numberof insertions of foreign genes is much larger in the external branches of the trees. These insertions also greatly outnumberthe occurrence of deletions, and yet the genome sizes of these bacteria remain roughly constant. This indicates that manyof the insertions are specific to each organism and are lost before related species can evolve. Simulations of the process ofinsertion and deletion, tailored to each phylogeny, support this conclusion.

    Introduction

    The number of genes in bacterial genomes are not

    stable (Mira, Ochman, and Moran 2001) but rather, overevolutionary time many genes can be deleted, duplicated,or inserted. There are two general classes of mechanismsinvolved in the generation of evolutionary novelty (Duttaand Ran 2002). One is the duplication/modification ofgenes and the other is the acquisition of foreign genes vialateral gene transfer (LGT). Genes that arise from LGTmay carry out new functions and may result in adaptationdue to a changing environment. A large number of genescan be inserted into or deleted from genomes through theprocess of lateral transfer. Some genes may even betransferred from very distantly related species, such asfrom bacteria to eukaryotes (Keeling and Doolittle 1997;Jain et al. 2002; Gogarten 2003; Klotz and Loewen 2003).

    For example, Martin et al. (2002) estimate 18% of thenuclear genes in Arabidopsis are cyanobacterial in origin.As more genome sequences become available, moreinformation about the nature of lateral gene transfer canbe obtained. Insertions and/or deletions resulting fromlateral gene transfer are widely thought to contribute togenome size variation and to differing pathogenic traits(Rode et al. 1999).

    In bacteria, LGT may be more important asa mechanism of evolutionary change than small-scalechange within genes. Laterally transferred genes can playa crucial role in the ability of bacteria to invade novelniches. Some transferred genes are involved in a bacte-riums ability to cause different diseases, and occasionallytransferred genes can lead directly to the evolution ofpathogenic strains from nonpathogenic strains (Hacker etal. 1990; Hacker and Kaper 2000; Ochman et al. 1996).

    There are several methods for detecting lateral genetransfer. One is based on atypical sequence characteristics,

    such as codon usage bias and G 1 C content. This is

    because the bias in codon usage is often species specific(Lawrence and Ochman 1998) and because genomic DNAfrom different organisms has a specific mean G 1 Ccontent (Garcia-Vallve, Romeu, and Palau 2000). Usingthis method, more than 18% of genes in Escherichia coliwere found to be laterally transferred genes (Lawrence andOchman 1998), as were approximately 24% of genes inThermotoga maritima (Garcia-Vallve, Romeu, and Palau2000). But codon usage bias and G 1 C content are notalways sufficiently reliable to detect laterally transferredgenes (Koski, Morton, and Golding 2001). Anothermethod by which to detect LGT is to reconstruct aphylogeny for the genes. This method uses a comparisonof phylogenetic trees among individual genes within

    closely related species to recognize unusual origins. Thisapproach is very powerful; nevertheless it may fail asa result of a variety of potential errors such as a difficultyin recognizing orthologous genes.

    Here we have taken a simpler approach of determiningthe presence/absence of genes in related bacteria. In mostcases, enough genome rearrangements have occurred toprevent the inference of the number of insertion/deletionevents but the change in the total number of genes due toinsertions/deletions can be determined. We have mappedthe number of gene insertions, deletions, and geneduplications onto the phylogeny for the species based ona 16s rRNA phylogeny. Unlike the relationship betweendivergence of sequence and gene order degradation inclosely related bacterial genomes (Suyama and Bork2001), the insertions, deletions, and duplications are notstrongly correlated with evolutionary branch length.Insertions are shown to be most numerous in externalbranches.

    Methods

    Fifty complete genome sequences were selected asthe basis for an analysis of gene insertion and deletion(table 1). Genomes were selected to be within the samegroup based on perceived relatedness and, wheneverpossible, consisted of congeneric species (when at least

    Key words: lateral gene transfer, bacterial genomes, phylogeny,molecular evolution.

    E-mail: [email protected]

    Molecular Biology and Evolution vol. 21 no. 7 Society for Molecular Biology and Evolution 2004; all rights reserved.

    Mol. Biol. Evol. 21(7):12941307. 2004doi:10.1093/molbev/msh129Advance Access publication March 28, 2004

  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    2/14

    four genomes were known). The genome sequences weredownloaded from TIGR (http://www.tigr.org/). If one ofthe genomes was annotated, we made use of thisannotation to extract all of the putative protein sequencesfrom the genome. The similarity of these proteins to theproteins in the other genomes was measured via theBLASTP algorithm (Altschul et al. 1997). All potential

    matches between genes were required to be hits withreciprocal expect values less than 10220. In addition, theywere required to have a match that extended over morethan 85% of the length of the query protein. Differentthresholds for an expect value can be expected to causesome minor variation in the identification of potentialhomologs, but a comparatively stringent level was chosenfor this study (Bansal and Meyer 2002). Protein sequencesthat satisfied these criteria were deemed to be in the samegene family. If the number of genes in the gene familychanged within the group of organisms, the extra copieswere assumed to be gene duplications. It is of course quitepossible that they are insertions of an extraneous genethat has a close similarity to an existing gene, but the

    assumption of duplication rather than transfer is moreconservative.

    It should be noted that the method being used hererelies on sequence similarity. So orthologous genereplacement (Koonin, Mushegian and Bork 1996) orunique genes will not be detected using this method. Inaddition, lineage-specific fast evolving genes (Pesole et al.1999; Koonin et al. 2004) will be treated as unique genesand will not be considered.

    If not all of the genomes were annotated (as is the casefor example with Pseudomonas fluorescens), the proteinsof an annotated genome were used to query the nucleotidesequence of the unannotated genome. This was done using

    the TBLASTN algorithm; querying the presence ofknown (annotated) proteins within the translated nucleo-tide sequence of the genome. The largest annotatedgenome was used for the TBLASTN algorithm to ensurethat the largest number of potential genes would berevealed. All nucleotide sequences that had significantmatches (according to the criteria above) to the annotatedprotein sequences were extracted and translated. To ensurethat any other potential coding sequences were identified,the nucleotide sequences between significant matches tothe annotated proteins (i.e., the gaps) were extracted if theywere greater than 100 nucleotides long. These nucleotidesequences were then translated in all six frames andsearched for long open reading frames (greater than 30

    amino acids).All translated proteins that had significant matches

    with the annotated genome were extracted. All thesequences which had matches with select representativesfrom the bacterial domain of life (table 2) were alsoextracted. Based on these searches a database of genes foreach species was established. If they had a significant hit toeach other, all the matches were referred to as being in thesame gene family. The bestBLAST hit is not always thebest indicator for the nearest phylogenetic neighbor oreven as an ortholog (Koski and Golding 2001), and so toavoid the effect of paralogs, we clustered all potentialhomologs in the same genome as a gene family. As above,

    Table 1Related Genomes Analyzed

    GroupGenus Species (Strain) Outgroup(s)a

    Pseudomonadaceae

    Pseudomonas P. aeruginosa H. influenzae P. fluorescens V. cholerae

    P. putidaP. syringae

    Chlamydiaceae

    Chlamydia C. muridarum L. interrogansC. trachomatis

    Chlamydophila C. caviaeC. pneumoniae AR39C. pneumoniae CWL029C. pneumoniae J138

    Mycoplasmataceae

    Mycoplasma M. gallisepticum C. acetobutylicum M. genitalium C. perfringens

    M. penetransM. pneumoniaeM. pulmonis

    Ureaplasma U. urealyticum

    Streptococcaceae

    Lactococcus L. lactis E. faecalisStreptococcus S. agalactiae NEM316

    S. agalactiae 2603V/RS. mutansS. pneumoniae R6S. pneumoniae TIGR4S. pyogenes M1 GASS. pyogenes MGAS315S. pyogenes MGAS8232

    Rhizobiales

    Agrobacterium A. tumefaciens

    C58 (Cereon) circularC. crescentus

    A. tumefaciens

    C58 (U) circularR. prowazekii

    Brucella B. melitensisB. suis

    Mesorhizobium M. lotiSinorhizobium S. meliloti

    Actinomycetales

    Corynebacterium C. efficiens B. longumC. glutamicum

    Mycobacterium M. lepraeM. tuberculosis CDC1551M. tuberculosis H37Rv

    Streptomyces S. coelicolor

    Bacillaceae

    Bacillus B. anthracis L. innocua B. cereus L. monocytogenes

    B. halodurans

    B. subtilisOceanobacillus O. iheyensis

    Staphylococcaceae

    Staphylococcus S. aureus N315 L. innocuaS. aureus MW2 L. monocytogenesS. aureus Mu50S. epidermidis

    Campylobacterales

    Campylobacter C. jejuni C. crescentus Helicobacter H. hepaticus X. fastidiosa9a5c

    H. pylori 26695H. pylori J99

    a Generic names of the outgroups are given in table 2.

    Patterns of Bacterial Gene Movement 1295

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    3/14

    differing numbers of paralogs in closely related species areattributed to duplication rather than to LGT. The singlelink method of Friedman and Hughes (2003) was used todefine the protein families. This groups genes that sharesimilarity to any other single member. For example, if

    genes A and B are in a family, and genes B and C are inanother family, then A, B, and C are in a family. Again ifthere are differing numbers of genes in a family, this istaken as evidence for gene duplication.

    With this procedure, only genes known to exist (or atleast annotated) in another organism are detectable. Forexample, all genes that we have indicated as present in

    Pseudomonas fluorescens have a significant similarity witha gene from eitherP. aeruginosa, P. putida, P. syringae, orwith species from the representative group of bacteria. Anypotentially novel genes in P. fluorescens which do nothave any homolog within other Pseudomonas or repre-sentative genomes are undetectable by this method andwill be missing from our results.

    Gene insertions/deletions were mapped onto thephylogenies according to a parsimony principle. Whenthere was ambiguity, a set of one or two outgroup genomeswere chosen to help determine whether the genes near thebase of the tree were inserted or deleted. If the homologsof genes are present in the outgroup genomes, the genesare referred to as deletions, and if not, they are noted asinsertions.

    The species analyzed and their outgroups are shownin table 1. They cover a broad spectrum of bacterialdiversity and have several different levels of within-groupdivergences. There are nine groups of taxa with a total of50 species analyzed. The groups analyzed included species

    from the Chlamydiaceae group and the Mycoplasmata-ceae, Streptococcaceae, Rhizobiales, Actinomycetales,

    Bacillaceae, Staphylococcaceae, and Campylobacteralesgroups.

    The 16s rRNA sequence from each species wasextracted from the genomes or from the RibosomalDatabase Project at http://rdp.cme.msu.edu/html/download.

    Table 2Representative Bacterial Genomes

    Agrobacterium tumefaciens C58 (U. W) chr circulara Bifidobacterium longumBuchnera aphidicola Sg Caulobacter crescentusClostridium acetobutylicum Clostridium perfringens

    Enterococcus faecalis Escherichia coliK12-MG1655Escherichia coli O157:H7 EDL933 Escherichia coli O157:H7 VT2-Sakai Fusobacterium nucleatum Haemophilus influenzae

    Helicobacter pylori 26695 Lactococcus lactis Leptospira interrogans Listeria innocua Listeria monocytogenes Mycobacterium tuberculosisCDC1551 Neisseria meningitidis Pasteurella multocida Rickettsia prowazekii Salmonella entericaserovar Typhi CT18Synechocystis sp Thermotoga maritimaVibrio cholerae (chr I) Xylella fastidiosa 9a5c

    a Strains are indicated only where more than one was available.

    FIG. 1.An example phylogeny to demonstrate the method to infergene insertions, deletions, and duplications. All the possibilities areshown in table 3. Throughout figures 110, external branches arenumbered and internal branches are labeled alphabetically.

    Table 3The Method Used to Infer Gene Number ChangesIllustrated with the Phylogeny Shown in figure 1

    Gene inSpecies A

    Gene inSpecies B

    Gene inSpecies C

    Gene inOutgroup

    Gene inRepresentatives Conclusion

    1 1 1 1 1 del a 1 1 ins 11 1 1 del a 1 1 ins 2 1 1 ins 2 1 1 1 ins 2 1 1 ins 3 1 1 ins 3 1 1 1 ins 31 1 del 31 1 1 del 31 1 1 del 31 1 1 1 del 31 1 del 21 1 1 del 21 1 1 del 21 1 1 1 del 2 1 1 ins a 1 1 1 del 1 1 1 1 ins a 1 1 1 1 del 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    11 1 1 1 1 dup 11 11 1 1 1 dup 21 1 11 1 1 dup 31 11 11 1 1 dup a

    1296 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    4/14

    html. These sequences were aligned using the Clustalwalgorithm (Thompson, Higgins, and Gibson 1994) andphylogenetic trees were generated using Puzzle (Strimmerand von Haeseler 1996; 1,000 puzzling steps with the HKY

    substitution model) and using Mr.Bayes (Huelsenbeck andRonquist 2001; 10,000 generations with the gammadistribution model). An example of the method used isshown in figure 1 and table 3.

    Because parsimony involves an inference about theinsertions/deletions, we used simulations to further explorethe process of insertion/deletion. These simulations wereconducted to gain an expectation of the patterns of genepresence/absence among the observed species withoutfurther inference. The simulations assume that there isa constant and equal insertion and deletion rate. Theseevents affect only a single gene in each case and areappropriate to measure the number of genes inserted/deleted but are not a measure of the number of events

    expected in nature. All genes are assumed to have beenpotentially inserted or deleted. In the simulations the rateof insertion/deletion was chosen to give the same numberof genes present in all species as that actually observed.The differences between simulation and observed resultsare listed in Appendix tables A.1 to A.3. For illustrationpurposes, the average number of genes in each genomewas normalized to 1,000.

    Results

    The number of genes inferred to have been inserted,deleted, or duplicated is shown in figures 2 to 10. The

    arrows directed toward a branch indicate insertions, arrowsdirected away from a branch indicate deletions, andnumbers within a rectangular box indicate the inferrednumber of duplications. The length and width of the

    arrows are proportional to the number of insertions ordeletions within each branch. Similarly, the length andwidth of the rectangular boxes are proportional to thenumber of duplications within each branch. The scale barbelow each species name indicates the overall genomesize.

    In general there is no obvious correlation between thenumber of ins/del/dup versus branch length. In the

    Pseudomonas group, the correlation coefficients betweenins/del/dup and branch length are 0.49, 0.00 and 0.36,respectively (table 4). Other R2 values for the Pseudomo-nadaceae are 0.82 between insertions and duplications,0.65 between insertions and deletions, 0.57 betweendeletions and duplications. However, these values are

    quite variable among different groups (data not shown).For example, P. putida KT 2440 has evolved compara-tively recently within the Pseudomonas group. If theincidence of indels were proportional to evolutionarybranch length, then the number of genes inserted into ordeleted from the external branch leading to P. putidashould be a fraction (0.14) as much as those in the branchleading to P. aeruginosa . Instead they are 0.90 and 2.58times larger for insertions and deletions, respectively. Inmost of the groups, the largest insertion or deletionnumbers were not on the longest branch. This is true forthe number of duplications as well. For example, within

    Mycoplasma species (fig. 4), the external branch (branch

    FIG. 2.The presence of genes within Pseudomonas species. Arrows pointed toward or away from a branch indicate the estimated number ofinsertions or deletions, respectively. Rectangles mirror the inferred number of gene duplications. Branch lengths (with vertical scale bar) indicatephylogenetic distance based on rRNA. Horizontal scale bar indicates genome size. External branches are numbered and internal branches are labeledwith lowercase letters. The bootstraps from Puzzle (top) and support from Mr.Bayes (bottom) are shown, if a branch was not supported by values of atleast 80% by both methods. Open arrows indicate that the genome was not annotated at the time of this study.

    Patterns of Bacterial Gene Movement 1297

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    5/14

    5) leading to M. penetrans has the largest number ofinferred duplications. Another short branch (branch 6)leading to M. gallisepticum has the largest number ofinferred insertions.

    From all of the genomes analyzed, we found thatMesorhizobium loti MAFF303099 and Streptomyces coeli-color A3(2) have the largest number of duplicated genes.

    For example, the probable ABC transport system ATP-

    binding proteins in S. coelicolor are members of a familywith 120 copies, while their homologs in Corynebacteriumglutamicum have only 57 copies. The transporter relatedgenes in Sinorhizobium meliloti have 59 copies, but theirfamily members in M. loti number 128 copies.

    These two species are among the larger genomesanalyzed here. The largest genome size belongs to S.

    coelicolor A3(2) (fig. 9), which also has high insertions

    FIG. 3.The presence of genes within the Chlamydiaceae group (table 1). Symbols as in figure 2.

    FIG. 4.The presence of genes within the Mycoplasmataceae group (table 1). Symbols as in figure 2.

    1298 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    6/14

    (707) and duplications (981) in the external branch leadingto S. coelicolor(branch 1). There are only 52 genes inferredto have been deleted during this branch. To gain insight intothe function of these mobile genes a further analysis of thetwo largest genomes,M. loti and S. coelicolor, was done. Of

    the genes inserted into the external branch leading to S.coelicolor (branch 1 in fig. 9), there are 414 new genesinvolved in biosynthesis function (table 5). Some of thesemight be important in mycelium formation (Redenbach,Scheel, and Schmidt 2000). Some of them may be essentialfor antibiotic production. In M. loti, 9.61% of the insertedproteins are transport related, and some of them are essentialfor adhesion to their host plant (Hacker and Kaper 2000).

    Two Brucella species were analyzed and each hastwo chromosomes. Based on the parsimony method, it islikely that the two chromosomes in each Brucella splitbefore the two Brucella speciated. There are fewer inferredgene events if the two chromosomes split earlier than thetwo species diverged (data not shown). At present, there

    are seven genomes with one large and one smallerchromosome available. They are Vibrio vulnificus, V.

    parahaemolyticus RIMD 2210633, V. cholerae, Lepto-spira interrogans, Deinococcus radiodurans R1, Brucellasuis, and B. melitensis. Usually ribosomal RNAs arepresent only on the large chromosome, but rrn operonswere found on both the large and small chromosomes in V.

    parahaemolyticus (Tagomori, Iida, and Honda 2002) andB. suis 1330 (Paulsen et al. 2002). It is possible that the rrnoperon in the small chromosome ofB. suis was duplicatedfrom the large one.

    From the diagrams in figures 2 to 10, it is apparentthat there are more gene insertions in external branches

    than the internal branches. And there are fewer genedeletions/duplications in the internal branches. Thissuggests that most of the genes inserted in one speciesare genes unique to that group and are likely to be deletedbefore a new species evolves.

    There are two paraphyletic groups found with thephylogenetic analysis. Ureaplasma uralyticum brancheswithin the Mycoplasma genus (fig. 4), and Oceanbacillusiheyensis branches within Bacillus (fig. 10). Ureaplasmaand Mycoplasma are members of the class Mollicutes.Based on the reclassification ofMollicutes (Weisburg et al.1989), M. genitalium, M. pneumoniae, and U. uralyticumbelong to the Pneumoniae group, and M. pulmonis belongsto the Hominis group. That Ureaplasma is within one ofthe Mycoplasma clusters was shown by Yoshida et al.(2002). In the Bacillus group, B. halodurans is distantlyrelated to the other three Bacillus compared with Oceano-bacillus iheyensis (Lu, Nogi, and Takam 2001).

    Within the Bacillaceae genus, B. cereus, B. anthracis,

    and B. thuringiensis have been assigned as varieties of thesame species (Helgason et al. 2000) and alternatively asdifferent species (Ivanova et al. 2003). In this study, wefound that about 200 unique genes separate B. cereus and

    B. anthracis. This large difference in genes perhaps leadsto their remarkably different phenotypes, even though thestrains may be closely related.

    Simulations were conducted to determine an expec-tation for patterns of gene presence/absence in theobserved phylogenies. In these simulations, 1,000 geneswere modeled with a constant and equal rate of insertionand deletion. Each insertion was considered to be toa unique gene. The phylogeny branch lengths as estimated

    FIG. 5.The presence of genes within the Streptococcaceae group (table 1). Symbols as in figure 2. Taxa 3, 7, and 9 form a multifurcationaccording to Puzzle (shown) but branch ((7,9),3) via Mr.Bayes.

    Patterns of Bacterial Gene Movement 1299

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    7/14

    from the rRNA genes were used in the simulation. Theoverall rate of gene ins/del was adjusted to give theobserved numbers of genes that are conserved across allspecies. The simulations confirm the above results(Appendix table A.1A.3). They show that while there isa bias such that insertions are expected to exceed deletions,

    the observed data deviate beyond what is expected. Theinsertions exceed deletions and external branches have anexcess of inserted/deleted genes. The identity of theinserted genes in each branch in the phylogeny is given assupplementary information at http://life.biology.mcmaster.ca/~weilong/research.html.

    FIG. 6.The presence of genes within the Rhizobiales group (table 1). Symbols as in figure 2. In the dashed box, the branch lengths are not shownto scale. The branch lengths leading to Brucella suis and B. melitensis are 0.00001 each. Both species have two chromosomes.

    FIG. 7.The presence of genes within the Staphylococcus group (table 1). Symbols as in figure 2.

    1300 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    8/14

    Discussion

    The revelation that many genes are laterally trans-ferred in bacteria has changed our concept of bacterialgenome evolution (for a review, see Doolittle 2000). Butwith the exception of viral genomes (e.g., see McLysaght,Baldi, and Gaut 2003), there have still been comparatively

    few analyses of entire genomes (but see Martin et al. 2002;Daubin, Laurat, and Perriere 2003; Daubin, Moran, andOchman 2003). In general a complete analysis of genomicchanges in related species is quite difficult (notwithstandingseveral excellent studies ofE. coli strains; e.g., Perna et al.2001). One major reason for this is that most sequencingprojects do not examine very closely related genomes. Even

    FIG. 8.The presence of genes within the Campylobacterales group (table 1). Symbols as in figure 2.

    FIG. 9.The presence of genes within the Actinomycetales group (table 1). Symbols as in figure 2.

    Patterns of Bacterial Gene Movement 1301

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    9/14

    for those projects that have determined the genome fromcongeneric species, the divergence of the species can still bequite extensive. For example, the fourPseudomonas speciesexamined here are sufficiently diverged to have effectivelymasked most of their genome rearrangements. In addition,

    when given good data, the correct inference of genomerearrangements is computationally difficult (Sankoff 2001;Nadeau and Sankoff 1998). Finally, the inference of lateraltransfer can itself be difficult. Therefore, we have takena simpler approach here and have asked only if the samegene exists in related bacteria. The presence or absence datafor each gene have then been mapped onto the phylogeny ofthe species (as inferred from rRNA sequences).

    By the same gene we mean a gene that sharessignificant similarity but not necessarily an actualhomolog. If the correct homolog were replaced via a lateraltransfer with a similar gene, the methods employed herewould fail to detect these transfers. If a gene belonged toa family of similar genes, transfers would not be detected,

    and only a duplication or deletion would be inferred.A gene is inferred to exist only if a similar gene

    already exists in some other bacterial species. Thestandards used to identify a similar gene are comparativelystringent, so some homologs may have been missedbecause of extensive sequence divergence. In addition,genes that might have been laterally transferred might havedistant similarity to existing genes and these would beclassified as duplicated family members. Some falsepositives and a larger number of false negatives are to beexpected (Genereux and Logsdon 2003). Nevertheless,many lateral transfers will be missed by these methods andhence these results should be considered conservative.

    This method is also biased to ignore any unique LTgenes and to underestimate changes at terminal branches.For example, any gene uniquely present on any externalbranch and not shared among the group or within theoutgroups will be ignored. The parsimony method used to

    infer the branch upon which an insertion/deletion occurredis also known to be biased; it is well known to under-estimate the number of events (Galtier and Boursot 2000;Dean et al. 2002; Felsenstein 2004). Taken together, thesebiases will cause the number of genes inserted/deleted/duplicated at the termini of the tree to be underesti-mated. Despite these biases, however, the majority of geneinsertions occur on external branches. Because genomesizes are not radically changed, this must imply that themajority of gene insertions tend to occur uniquely withinthe external branches and will be deleted (or replaced)before new species diverge. This supports the hypothesisthat these lateral transfers help the bacteria to occupya species specific niche.

    FIG. 10.The presence of genes within the Bacillaceae group (table 1). Symbols as in figure 2.

    Table 4The ins/del/dup Genes Are Not Correlated with theEvolutionary History Within Pseudomonas Species

    Branch Length Insertions Deletions Duplications

    1 0.03465 520 90 3052 0.00501 469 232 2913 0.01878 484 256 1544 0.01348 590 440 388a 0.00784 88 13 86b 0.00783 38 14 12

    Correlation coefficient 0.49 0.00 0.36

    1302 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    10/14

    From the diagrams (figs. 210), it is plain that thereare generally more inserted genes than deleted genes. Thisresult is consistent with previous research which reportedthat there are more insertions than deletions (Rode et al.1999; Ochman and Jones 2000). All other factors kept equal,in bacteria a small genome has an evolutionary advantage(Moran 2002) and hence, the fate of nonfunctionalsequence is to be eliminated especially in obligate parasites.But in this study, even in the obligate parasite Mycoplasma,there are more insertions than deletions.

    A recurrent theme in figures 210 is the lack of

    correlation of insertions and length of evolutionary history.As measured by rRNA substitutions, Streptococcusagalactiae strain NEM316 has had a long evolutionaryhistory compared with strain 2603V/R. But NEM316 hasfewer inserted genes than 2603V/R. In addition, there aremore insertions than deletions in most branches. In somespecies, for example Buchnera aphidicola, there is a stablegenome with a conserved gene order (van Ham et al. 2003)and there are only a few deletions from the branches of thisgroup (Silva, Latorre, and Moya 2003). Again, generearrangement in the genome at this level of resolution isnot dependent on the length of evolutionary history.

    The results of Mirkin et al. (2003) suggest equal

    numbers of insertions and deletions, whereas Daubin,Lerat, and Perriere (2003) and the results presented heresuggest more insertions than deletions. The main differ-ence between these studies is the distance between thespecies examined. The phylogeny used by Mirkin et al.(2003) covers the complete scope of lifebacteria,archaea, and eukaryota. Their main purpose was todetermine the subset of genes present in the last commonancestor of all life. Here genomes were specificallyselected from related groups. Although the genes ofbacterial genomes are in a constant state of flux (Snel,Bork, and Huynen 2002), not all insertion/deletion eventsare easily detected. With large divergences betweenspecies, this inference becomes more and more difficult,

    but the inference of the gene set in the last commonancestor requires this level of divergence. Consistent withthis, we have detected a larger number of ins/del eventsthan did Mirkin et al. (2003), and we are probably stillmissing a large number of ins/del events because thespecies chosen are still too distantly related. As moregenomes become available in the future a more accurateestimate will become possible.

    Nor are duplicated genes correlated with the length ofevolutionary history or consistent across taxa. In thesespecies, there is no evidence that repeated events ofgenome doubling are a major force for evolution (Rileyand Anilionis 1978; Mira, Ochman, and Moran 2001).

    Although it was reported that there is an almost linearrelationship between divergence of sequence and geneorder degradation in closely related bacterial genomes(Suyama and Bork 2001), our study looks at a deeperphylogenetic level where there are no longer remnants ofany correlation between ins/del/dup and evolutionarybranch length. Sometimes the largest ins/del/dup numberoccurred along a short branch. A similar result was foundin the evolutionary history of viral genomes with in-dependent duplication and deletion events in lineages ofviruses (Herniou et al. 2001; Hughes and Friedman

    2003). The correlation coefficients between duplicationsand insertions are variable. The R2 values are 0.02of Bacillaceae, 0.24 of Actinomycetales, 0.39 of Myco-

    plasmataceae, 0.40 of Streptococcaceae, 0.56 of Cam-pylobacterales, 0.66 of Chlamydiaceae, 0.82 of Pseudomonadaceae, and 0.85 of Staphylococcaceae.Some duplications have been shown to be due to LGT(Gogarten, Doolittle, and Lawrence 2002; Snel, Bork, andHuynen 2002). Duplications are more common amonglaterally transferred genes than among native genes(Hooper and Berg 2003). Therefore, many of the genesinferred to have been duplicated may also have beenlaterally transferred. These would inflate the number of

    insertions beyond what is inferred here.The rRNA genes have been used here to providea measure of the evolutionary history. Daubin, Moran, andOchman (2003) suggest that LGT has not eliminated thephylogenetic signals. Nevertheless, it is possible that thisreflection of the species history given by the rRNA genesequence is inaccurate but it is still apparent that no otherevolutionary history could eliminate the patterns observed.While the numbers of indels are not correlated with therRNA evolutionary history, nor are they strongly correlatedwith each other. Some evolutionary branches have highnumbers of inserted genes and yet low numbers of deletedgenes.

    Lateral gene transfer can occur via acquisition and

    uptake from the environment, so genes involved in DNAuptake and recombination play an important role in LGT(Silva, et al. 2003). It is well known that the loss of manygenes that encode proteins associated with recombinationand DNA uptake maintains the stability of B. aphidicolagenomes (Silva, Latorre, and Moya 2003). The functionalloss or enhancement of genes involved in recombinationand DNA uptake is more likely based on selection orglobal deletion-insertion biases (Petrov 2001). Many otherforces might affect the number of deletions/insertions, butthese are not clear. Which force is important and howselection works to enhance or curtail gene insertions anddeletions will be very important in future studies.

    Table 5The function of inserted genes in Mesorhizobium loti MAFF303099 and Streptomycescoelicolor A3(2)

    UnknownFunction

    TransportRelated

    TranscriptionalRegulator

    BiosynthesisEnzyme Other

    M. loti 429(40.02%) 103(9.61%) 73(6.81%) 338(31.53%) 129(12.03%)S. coelicolor 207(21.10%) 99(10.09%) 89(9.07%) 414(42.20%) 172(17.53%)

    Patterns of Bacterial Gene Movement 1303

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    11/14

    Our results suggest that an amazingly large number ofgenes have been laterally transferred even within compar-atively closely related bacteria. Lateral transfers are thoughtto be influenced more by physical proximity than byphylogenetic proximity of the organisms (Matte-Tailliez etal. 2002) and are seen to share similar genomic propertiessuch as genome size, genome G/C composition, and carbonutilization (Jain et al. 2003). The functions of most laterallytransferred genes are still unknown. The most studiedlateral transfers belong to pathogenic islands, but pathoge-nicity islands are only a small part of genes laterally

    transferred. The results presented here suggest that many ofthe LT genes may be species-specific adaptations.

    Acknowledgments

    This work was supported by a Natural Sciences andEngineering Research Council of Canada (NSERC) grantto G.B.G. The authors wish to thank Prof. R.A. Morton,the editor, and two anonymous reviewers for their helpfulcomments on previous drafts.

    Appendix: Simulations of Gene Patterns

    The patterns of gene presence for each species are

    listed in tables A.1 to A.3. In these tables, a 1 indicatesa gene present in the species and a 0 indicates absence.The order of these numbers is matched to the species orderfrom left to right within each phylogeny. Simulations wereconducted with the assumption of constant and equalinsertion/deletion rates.

    The expected number of each gene pattern can becalculated easily if one assumes a constant and equal rateof gene insertion/deletion. Deleted genes cannot beobserved, but if one conceptually retains a placeholderfor a gene then the number of empty placeholders can becounted. If the rate of gene insertion is m and the rate ofgene deletion is l (both events involving only a single

    gene and all events independent), then the equilibriumfrequency for a gene to be present is m/(l 1 m). Asevolution proceeds, genes will be deleted or inserted atthese rates, and the probability at time t that any oneplaceholder will be occupied by a gene (pt) or not can befound as follows,

    pt m=l m 1 m ltp0 m=l m

    and ifl m then

    pt

    1

    2 1 2lt

    p0

    1

    2

    :

    For two taxa separated by t1 and t2 generations andstarting initially with all gene placeholders occupied (p0 1), the gene patterns will be observed according to

    11 pt1

    3pt2

    10 pt1

    3 1 pt2

    01 1 pt1 3pt

    2

    00 1 pt1 3 1 pt

    2:

    More complicated patterns and phylogenies can becalculated using the appropriate initial conditions for eachancestral node.

    More realistically, placeholders cannot be observedand when an insertion occurs in any branch it would nothave any homology to any other gene present in the group(otherwise it would be considered a deletion). Anyinsertion is therefore treated as a new, unique gene. Thisassumption complicates the above results, and a simulation,which incorporates this, was done to compare observedand expected gene patterns of presence/absence (tables A.1and A.2). In the simulation study, the simulation processwas started with a total of 1,000 genes, and this numberwas kept relatively constant due to an equal rate ofdeletions and insertions. Genes were picked at random fordeletion. When an insertion occurred in any branch, it

    Table A.1The Difference Between Observed Patterns and Simulated Patterns

    Campylobacterales Staphylococcaceae Pseudomonadaceae

    Patternsa Observed Simulated Observed Simulated Observed Simulated

    0001 7 12 11 0 92 800010 8 24 15 12 104 1040011 323 145 6 15 17 73

    0100 524 128 1 15 69 430101 2 7 4 2 97 130110 2 2 75 0 7 90111 106 294 245 223 17 1321000 162 378 34 235 93 1781001 2 1 7 0 28 11010 0 0 0 0 68 11011 47 43 0 9 62 191100 112 67 0 8 34 761101 4 10 8 9 59 691110 1 5 11 0 95 481111 486 487 741 741 611 610

    GNb 1,226 1,943 2,859

    a In this column a 1 indicates a gene present in the species and a 0 indicates absence. The order of the numbers is matched

    with the species order from left to right in each phylogeny in figures 2 to 10.b Observed values are normalized to 1,000 genes to facilitate comparison between groups. To recover the actual number

    observed in each pattern, multiply by the average number of genes analyzed for each group (GN).

    1304 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    12/14

    Table A.2The Difference Between Observed Patterns and Simulated Patterns

    Chlamydiaceae Mycoplasmataceae Actinomycetales Bacillaceae

    Patternsa Observed Simulated Observed Simulated Observed Simulated Observed Simulated

    000001 2 64 8 2 10 0 56 2000010 2 11 21 1 6 0 44 7000011 77 67 300 92 699 51 491 167

    000100 7 45 384 157 3 43 100 185000101 5 1 5 1 14 0 10 0000110 6 3 3 1 2 0 8 0000111 14 70 63 71 180 302 78 71001000 0 0 234 181 56 146 76 283001001 0 0 0 0 0 0 6 0001010 0 0 5 0 0 0 5 0001011 0 0 26 3 6 1 44 16001100 2 0 53 3 0 0 26 15001101 0 0 0 0 0 0 3 1001110 0 0 5 0 0 0 2 0001111 1 0 39 27 4 13 45 80010000 1 3 55 183 38 121 94 195010001 0 0 0 0 0 0 7 0010010 0 0 0 0 0 0 7 0010011 0 0 8 2 3 2 40 31010100 0 0 8 3 0 0 36 26010101 0 0 0 1 0 0 3 2010110 0 0 0 0 0 0 4 0010111 0 0 13 20 1 21 69 145011000 1 0 45 51 316 177 43 48011001 0 0 0 1 1 0 1 1011010 0 0 0 0 0 0 4 0011011 0 0 5 15 23 14 39 82011100 0 0 18 16 3 2 49 71011101 0 0 0 3 1 0 8 4011110 0 0 5 2 1 0 10 0011111 1 1 84 172 66 145 396 395100000 1 1 153 315 248 428 c 100001 0 0 0 0 5 0 100010 0 0 18 0 1 0 100011 0 0 8 5 90 12 100100 0 0 11 7 2 3

    100101 0 0 0 1 1 0 100110 0 0 0 1 0 0 100111 0 0 5 61 49 133 101000 1 0 13 5 22 10 101001 0 0 0 0 0 0 101010 0 0 8 0 0 0 101011 0 0 8 5 5 2 101100 0 0 5 7 0 0 101101 0 0 0 2 0 0 101110 0 0 5 1 0 0 101111 5 3 76 57 13 23 110000 24 0 21 4 9 11 110001 0 0 0 0 0 0 110010 0 0 0 0 0 0 110011 0 0 8 4 4 2 110100 1 0 11 5 0 1 110101 0 0 0 1 0 0 110110 0 0 0 1 0 0 110111 1 0 29 46 1 36 111000 222 86 18 30 63 85 111001 5 0 0 1 1 0 111010 0 1 0 0 2 0 111011 9 31 18 29 52 19 111100 51 62 13 36 4 4 111101 8 7 5 8 2 0 111110 8 30 5 3 1 0 111111 776 777 361 362 226 225

    GNb 870 380 2,004 2,511

    a Patterns in this column indicate gene presence/absence as described in table A.1.b Observed values are normalized to 1,000 genes to facilitate comparison between groups. To recover the actual number observed in each pattern, multiply by the

    average number of genes analyzed for each group (GN).c There are only five Bacillaceae species. To make the table concise, the first letter of the pattern was set to 0.

    Patterns of Bacterial Gene Movement 1305

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/
  • 8/3/2019 Mol Biol Evol-2004-Hao-1294-307

    13/14

    became an instance of a new gene. The phylogenies givenin figures 2 to 10 were used. The overall rate of insertion/deletion was chosen to give the number of genes that areobserved to be conserved across all species (e.g., pattern1111). Each simulation was repeated 10 times and thenumber for each presence/absence pattern counted. Theaverage of these numbers is shown in tables A.1 and A.2.

    To facilitate comparison between the differentbacterial groups, the observed results have been normal-ized to a total of 1,000 genes. The average number ofgenes analyzed for each group is given in tables A.1 and

    A.2, and the actual number in each pattern can be found bymultiplying the number of given in the table by theaverage and dividing by 1,000.

    The patterns from the simulations and those observedare quite different. The number of events inferred onancestral branches should generally be much larger. Forexample, in the Pseudomonadaceae the patterns 0011and 1100 are strongly underrepresented in the observeddata versus the simulated data (17 and 34 vs. 73 and 76,respectively). In figure 2 this translates to a deficiency ofevents on the ancestral branches a and b. Similarly, thereshould be more events on the ancestral branch a in figures4 and 8 (or alternatively branch 4 depending on the state ofthe outgroup) and more deletions in branch a or moreinsertions in branch 4 in figure 7. Reinforcing this, thenumbers of genes which have a dispersed and discontin-uous distribution within the phylogeny are generally muchlarger in the observed results than in the simulations. Thismay be the result of lateral transfer occurring more readilywithin a group than to more taxonomically distant groups.For example, there are 97 genes (normalized to a total of1,000) present in P. putida, P. syringae and absent from P.aeruginosa, P. fluorescens, but only 13 in the simulationresults. Similar results hold for genes in P. aeruginosa, P.syringae that are absent from P. putida, P. fluorescens.Using the Pseudomonas group as an example, the inferrednumber of insertions/deletions was compared with the

    simulation results (table A.3; in this table deletions ofduplicated genes were removed from considerationbecause the simulation did not include the possibility ofduplicated genes). The simulation results indicate that evenwhen insertion/deletion rates are equal, there is anobservational bias toward more insertions than deletions.But the observed gene numbers deviate beyond this biasand even more strongly favor insertions over deletions.

    Literature Cited

    Altschul, S. F., T. L. Madden, A. A. Schffer, J. Zhang, Z. Zhang,W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-

    BLAST: a new generation of protein database searchprograms. Nucleic Acids Res. 25:33893402.

    Bansal, A. K., and T. E. Meyer. 2002. Evolutionary analysis bywhole-genome comparisons. J. Bacteriol. 184:22602272.

    Daubin, V., E. Lerat, and G. Perriere. 2003. The source oflaterally transferred genes in bacterial genomes. Genome Biol.4:R57.

    Daubin, V., N. A. Moran, and H. Ochman. 2003. Phylogenetics

    and the cohesion of bacterial genomes. Science 301:829832.Dean, A. M., C. Neuhauser, E. Grenier, and G. B. Golding. 2002.

    The pattern of amino acid replacements in alpha/beta-barrels.Mol. Biol. Evol. 19:18461864.

    Doolittle, W. F. 2000. Uprooting the tree of life. Sci. Am.282:9095.

    Dutta, C., and A. Ran. 2002. Horizontal gene transfer andbacterial diversity. J. Biosci. 27:2733.

    Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates,Sunderland, Mass.

    Friedman, R., and A. L. Hughes. 2003. The temporal distributionof gene duplication events in a set of highly conserved humangene families. Mol. Biol. Evol 20:154161.

    Galtier, N., and P. Boursot. 2000. A new method for locatingchanges in a tree reveals distinct nucleotide polymorphism vsdivergence patterns in mouse mitochondrial control region.J. Mol. Evol. 50:224231.

    Garcia-Vallve, S., A. Romeu, and J. Palau. 2000. Horizontal genetransfer in bacterial and archaeal complete genomes. GenomeRes. 10:17191725.

    Genereux, D. P., and J. M. Logsdon Jr. 2003. Much ado aboutbacteria-to-vertebrate lateral gene transfer. Trends Genet.19:191195.

    Gogarten, J. P. 2003. Gene transfer: gene swapping craze reacheseukaryotes. Curr. Biol. 13:R53R54.

    Gogarten, J. P., W. F. Doolittle, and J. G. Lawrence. 2002.Prokaryotic evolution in light of gene transfer. Mol. Biol.Evol. 19:22262238.

    Hacker, J., L. Bender, M. Ott, J. Wingender, B. Lund, R. Marre,

    and W. Goebel. 1990. Deletions of chromosomal regionscoding for fimbriae and hemolysins occur in vitro and in vivoin various extraintestinal Escherichia coli isolates. MicrobialPathogenet. 8:213225.

    Hacker, J., and J. B. Kaper. 2000. Pathogenicity islands andthe evolution of microbes. Annu. Rev. Microbiol. 40:169189.

    Helgason, E., O. A. kstad, D. A. Caugant, H. A. Johansen, A.Fouet, M. Mock, I. Hegna, and A.-B. Kolst. 2000. Bacillusanthracis, Bacillus cereus, and Bacillus thuringiensisonespecies on the basis of genetic evidence. Appl. Environ.Microbiol. 66:26272630.

    Herniou, E. A., T. Luque, X. Chen, J. M. Vlak, D. Winstanley,J. S. Cory, and D. R. OReilly. 2001. Use of whole genomesequence data to infer baculovirus phylogeny. J. Virol. 75:81178126.

    Hooper, S. D., and O. G. Berg. 2003. Duplication is morecommon among laterally transferred genes than amongindigenous genes. Genome Biol. 4:R48.

    Huelsenbeck, J. P., and F. Ronquist. 2001. MrBayes: Bayesianinference of phylogenetic trees. Bioinformatics 17:754755.

    Hughes, A. L., and R. Friedman. 2003. Genome-wide survey forgenes horizontally transferred from cellular organisms tobaculoviruses. Mol. Biol. Evol. 20:979987.

    Ivanova, N., A. Sorokin, I. Anderson et al. (23 co-authors) 2003.Genome sequence of Bacillus cereus and comparativeanalysis with Bacillus anthracis. Nature 423:8791.

    Jain, R., M. C. Rivera, J. E. Moore, and J. A. Lake. 2002.Horizontal gene transfer in microbial genome evolution.Theor. Popul. Biol 61:489495.

    Table A.3The Comparison of Inferred ins/del Between Observedand Simulated Results for the Pseudomonas Species

    Branch 1 2 3 4 a b

    Observed Insertions 182 164 169 206 31 13Deletions 26 69 69 100 4 3

    Simulation Insertions 179 55 110 87 37 36

    Deletions 142 20 77 53 38 38

    1306 Hao and Golding

    http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/http://mbe.oxford