Upload
hadiep
View
213
Download
1
Embed Size (px)
Citation preview
© 1999 Macmillan Magazines LtdNATURE | VOL 399 | 27 MAY 1999 | www.nature.com 323
articles
Evidence for lateral genetransfer between Archaea andBacteria from genome sequence ofThermotoga maritimaKaren E. Nelson, Rebecca A. Clayton, Steven R. Gill, Michelle L. Gwinn, Robert J. Dodson, Daniel H. Haft,Erin K. Hickey, Jeremy D. Peterson, William C. Nelson, Karen A. Ketchum, Lisa McDonald, Teresa R. Utterback,Joel A. Malek, Katja D. Linher, Mina M. Garrett, Ashley M. Stewart, Matthew D. Cotton, Matthew S. Pratt,Cheryl A. Phillips, Delwood Richardson, John Heidelberg, Granger G. Sutton, Robert D. Fleischmann,Jonathan A. Eisen, Owen White, Steven L. Salzberg, Hamilton O. Smith, J. Craig Venter & Claire M. Fraser
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The 1,860,725-base-pair genome of Thermotoga maritima MSB8 contains 1,877 predicted coding regions,1,014(54%) of which have functional assignments and 863 (46%) of which are of unknown function. Genome analysisreveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that haveorthologues only in the genomes of other thermophilic Eubacteria and Archaea. Of the Eubacteria sequenced to date,T. maritima has the highest percentage (24%) of genes that are most similar to archaeal genes. Eighty-onearchaeal-like genes are clustered in 15 regions of the T. maritima genome that range in size from 4 to 20 kilobases.Conservation of gene order between T. maritima and Archaea in many of the clustered regions suggests that lateralgene transfer may have occurred between thermophilic Eubacteria and Archaea.
Thermotoga maritima, a non-spore-forming, rod-shaped bacteriumbelonging to the order Thermotogales, was originally isolated fromgeothermal heated marine sediment at Vulcano, Italy1, and has anoptimum growth temperature of 80 8C. T. maritima metabolizesmany simple and complex carbohydrates including glucose,sucrose, starch, cellulose and xylan1,2. Both cellulose and xylan,through conversion to fuels (such as H2), have great potential asrenewable carbon and energy sources.
T. maritima is also of evolutionary significance, because small-subunit ribosomal RNA (SSU rRNA) phylogeny has placed thisbacterium as one of the deepest and most slowly evolving lineages inthe Eubacteria3. To elucidate further its unique metabolic propertiesand evolutionary relationship to other microbial species, wesequenced the genome of the type strain T. maritima MSB8 usingthe whole-genome random-sequencing method previouslydescribed4,5.
General features of the genomeThe genome of T. maritima is a single circular chromosomeconsisting of 1,860,725 base pairs (bp) (Fig. 1) with an averageG þ C content of 46%. A single rRNA operon (16S–23S–5S),containing an isoleucine transfer RNA and an alanyl tRNA in thespacer region between the small-and large-subunit genes,correspondsto the one region of the chromosome with a significantly higher G þ Ccontent (62%). A region of significantly lower G þ C content (34%)encodes lipopolysaccharide biosynthesis (LPS) proteins.
On the basis of analysis of G þ C ratio, G–C skew6
(G 2 C=G þ C) and asymmetric distribution of oligomers7 in theT. maritima genome, we could not identify a characteristic bacterialorigin of replication, a situation similar to that observed in thegenomes of the Archaea Methanococcus jannaschii8 and Archaeoglobusfulgidus5. We assigned base-pair one of the genome at the beginningof the longest stretch (2.6-kb) of 30-bp repeats (Fig. 2).Open reading frames. We identified 1,877 open reading frames(ORFs) (Figs 1, 2; Tables 1, 2), with an average size of 947
nucleotides, using the coding-analysis program GLIMMER9 (seeMethods). Coding sequences cover 95% of the chromosome.Predicted protein sequences were searched against a non-redundantprotein database and biological roles were assigned to 1,014 (54%)
1100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000900,0001,000,000
1,100,000
1,200,000
1,300,000
1,400,000
1,500,000
1,600,000
1,700,000
1,800,000
Figure 1 Circular representation of the T. maritima MSB8 genome showing
predicted-coding regions and other features. Outer circle, predicted protein-
coding regions on the plus strand classified by role according to the colour code
in Fig. 2 (unknowns and hypotheticals are in black). Second circle, predicted
protein-coding regions on the minus strand. Third circle, x2 composition in 2000-
bp windows (see Methods); bands correspond to x2 values with P < 1:9 3 102 9.
Fourth circle, Archaea-like islandson the genome. Fifth circle, small repeats. Sixth
circle, large repeats (black), large repeats associated with small repeats (red).
Seventh and eighth circles, rRNAs and tRNAs, respectively.
© 1999 Macmillan Magazines Ltd
of them using the classification scheme adapted from ref. 10. Four-hundred-and-seven (22%) predicted coding sequences matchedhypothetical coding sequences from other species, and 373 (20%)had no database match. Forty-six stable tRNAs with specificity forall 20 amino acids were identified.Repeats. The T. maritima genome has 143 copies of a 30-bp repeatfound in eight distinct clusters on the chromosome (Figs 1, 2).The 30-bp repeats are interspersed with a unique 39–40-bpsequence and are followed by a 452-bp sequence to form a structureidentical to that described for the genomes of M. jannaschii8 and A.fulgidus5. Our examination of the Aquifex aeolicus genome11 revealsthat similar repeats are present, but they are fewer in number, andshorter, than those in T. maritima. These small/large repeatstructures have only been identified in the genome sequences ofthermophiles.
In addition to the 30-bp repeat structures, eight classes of largerepeats, more than 200 bp in length and with .95% identity to eachother, are present in the T. maritima genome (Fig. 1, Table 1).Multigene families. We identified 214 gene families (P # 10 2 5 over60% of the length of the sequence—see Methods) in T. maritima. Ofthese families, 126 consist of two members, and the largest genefamily (of ATP-binding subunits of ABC transporters) contains67 members. Two other large families (one with 22 members andone with 47 members) consist exclusively of proteins involved intransport. Fifteen families, unique to T. maritima, consist ofproteins with no database match. Eleven of these familieshave two members, and the largest group has seven members(http://www.tigr.org/tdb/mdb/mdb.html).
Solute uptake and metabolismSeveral transporters reflecting the heterotrophic metabolism of thisspecies are present, including those for importing maltose (malE),ribose (rbsB) and spermidine and/or putrescine (potD), as well asseveral carbohydrate transporters whose specificity is unknown.Carriers for the uptake of amino acids and oligopeptides, and ion-transport systems for the acquisition of K+ (trkA/H), Mg2+ (mgtE),NH+
4 (amt), PO2−4 (pstA/C/B/S) and both oxidized and reduced
forms of iron (feoA/B) are present. The large number of transportersfor carbohydrates and amino acids (Figs 3, 4) suggests that theenvironment in which T. maritima is found is rich in organicmaterial.
The predominant mechanism of transport in T. maritima is ATP-coupled solute flux. Eighty-four percent of the proteins in thetransport category are subunits of ATP-binding cassette (ABC)transporters. Phylogenetic comparisons of the periplasmic solute-binding protein (SBP) component (Fig. 4) roughly parallels thefamilies defined in other Eubacteria12 with a marked expansion ofproteins specific for oligopeptides. Nine of the eleven oligopeptidesystems appear to be in operons with genes essential for sugarmetabolism (Fig. 5). Of the published genomes, only Pyrococcushorikoshii13 has an oligopeptide transporter associated with a sugardegrading enzyme (b-galactosidase). This operonic structure sug-gests that there is coordinate regulation of peptide import withsugar degradation in these two organisms, although it is far moreextensive in T. maritima. This contrasts with classical regulatorynetworks where the transported substrate affects transcription ofthe ABC transporter genes14.
Almost 7% of the predicted coding sequences in the T. maritimagenome are involved in metabolism of simple and complex sugars,more than twice the percentage seen in other eubacterial andarchaeal species sequenced to date. Several genes encoding proteinsinvolved in the sequential degradation of xylan are present. Genesencoding endoglucanases (celA/B) and b-glucosidases (bglB) thatare involved in cellulose degradation were identified, confirming thepresence of the cellulolytic systems predicted by biochemical studiesof this organism15. T. maritima does not have a complex system forthe degradation of plant cellulosic materials, as described for thethermophilic bacterium Clostridium thermocellum, in which thedegradation of cellulose depends on a multienzyme complex knownas the cellulosome, composed of between 14 and 26 subunits16.
Glucose catabolism in T. maritima involves the Embden-Meyer-hof and Entner-Doudoroff glycolytic pathways. In addition, thenon-oxidative branch of the pentose-phosphate pathway appears tobe involved in glucose breakdown. Genome analysis indicates thatT. maritima can metabolize glycerol, gluconate and numeroussugars including amylose, maltose and galactose, as well as theamino acids aspartate, threonine and glycine (Fig. 3). CoA-SH-dependent ferredoxin oxidoreductases specific for pyruvate, as wellas partial and complete operons for ferredoxin oxidoreductases ofunknown specificity, are also present on the genome.
Biosynthetic pathways for nine amino acids were identified inT. maritima (Fig. 3). In addition, genes that encode proteins forthe biosynthesis of biotin, folic acid, haem, porphyrin, lipoate,menaquinone, ubiquinone, pantothenate and pyridoxine wereidentified. T. maritima may also synthesize glycogen as a storagepolysaccharide.
Along with an ability to gain energy through a fermentativemetabolism, T. maritima can grow as a respiratory organism,generating energy in the presence of Fe(III) (ref. 17). Growth withsulphur as the terminal electron acceptor does not produce ATP18,
articles
324 NATURE | VOL 399 | 27 MAY 1999 | www.nature.com
Table 1 General features of the T. maritima MSB8 genome
General features.............................................................................................................................................................................
Length of sequence 1,860,725G + C ratio 46%Total no. of sequences 30,140Average read length (bp) 531Open reading frames 1,877Protein coding regions 95%Ribosomals 1 5S–16S–23StRNAs 46 (10 clusters/19 single genes).............................................................................................................................................................................
Chromosomal coding sequences.............................................................................................................................................................................
No. similar to known proteins 1,014No. of conserved hypotheticals 407No. similar to proteins of unknown function 83No. without a database match 373
Total 1,877.............................................................................................................................................................................
Repeats.............................................................................................................................................................................
Class Length Copies Database match
SR-01 30 143 tttccatacctctaaggaattattgaaaca
LR-01 1,897 2 hypothetical proteinLR-02 1,403 2 a-glucosidaseLR-03 1,137 4 putative transposaseLR-04 1,082 2 methyl-accepting chemotaxis proteinLR-05 858 2 putative transposaseLR-06 555 2 helicaseLR-07 252 2 excinucleaseLR-08 241 2 putative transposase.............................................................................................................................................................................
Figure 2 Linear representation of the T. maritima MSB8 genome. The locations of
each predicted protein-coding region (colour-coded by biological role), RNA
genes, tRNAs and repeat elements are indicated. Arrows represent the direction
of transcription for each predicted coding region. Numbers next to the tRNA
symbols represent the number of tRNAs at a locus. Numbers next to GES
represent the number of membrane-spanning domains predicted by the Gold-
man, Engelman, and Steitz scale as calculated by TopPred45 for that protein. Only
proteins with five or more GES domains are shown. Presumed transporter
specificity is indicated above predicted coding regions identified as transporters.
Transporterabbreviationsare as follows: +, cations;H+, protons;K+, potassium;Pi,
phosphate; Zn, zinc; aa, amino acids; Na+, sodium; COH, sugar; aaX, oligopep-
tides; mal, maltose; rib, ribose; s/p, spermidine/putrescine; ura, uracil; ant,
antibiotics; Fe2+, iron(II); Fe3+, iron (III); NH+4, ammonium; bcaa, branched chain
amino acids; g3Pi, glycerol-3-phosphate; glyc, glycerol; chro, chromate; Mg2+,
magnesium; question marks (?) indicate where substrate specificity is uncertain
or unknown. Members of paralogous gene families are identified by family
number in a box above the predicted coding region.
Q
© 1999 Macmillan Magazines Ltd
but this pathway allows for the elimination of growth inhibitory H2
which is produced during fermentative growth. Various flavopro-teins and iron-sulphur proteins have been identified as potentialelectron carriers.
Response to environmental stimuliT. maritima demonstrates a carbohydrate-dependent thermotacticresponse to temperature gradients between 50 and 105 8C (ref. 19).
Motility is regulated by a two-component histidine kinase signal-transduction pathway that is assembled from products of the Chegenes (cheA/B/C/D/R/W/Y) and seven methyl-accepting chemotac-tic transducer proteins (MCPs). Genes encoding T. maritima MCPsare most closely related to homologues in Bacillus subtilis, withspecificity for both amino acids and carbohydrates. AlthoughT. maritima demonstrates no chemotactic response to serine19, thepresence of amino-acid-specific MCPs indicates that T. maritimamay respond to aspartate, threonine and glycine, which appear to becatabolized by this bacterium.
In addition to the chemotactic-response kinases, other membersof the two-component signalling family identified in T. maritimaare likely to have a role in monitoring and responding to environ-mental stimuli such as temperature and nutrients. This group of sixsignalling partners includes the previously identified histidine-kinase (hpkA)/response-regulator (drrA) pair belonging to theOmpR–PhoB subfamily of transcriptional regulators20.
Although no heat- or cold-shock response has been experimen-tally demonstrated in T. maritima, the genome contains genesencoding heat-shock proteins (dnaJ/K, groEL/ES, grpE, hslU/V andhsp) and cold-shock proteins (encoded by cspB/L), which probablyhelp regulate responses to changes in ambient temperature andgrowth conditions. T. maritima also contains genes encoding ageneral stress protein (ctc) and a stationary-phase-survival protein(surE), both of which are presumably involved in stress survival.
Cellular activitiesProtein secretion, competence and transformation. In addition tothe Sec-A-dependent secretory pathway, T. maritima has twospecialized export systems. The first, which is assembled fromhomologues of FliH/P/R and FlhA/B21, is probably associated withthe secretion and assembly of the single T. maritima flagellum19. Thesecond is a type-II secretion pathway that is assembled from thegeneral secretion pathway proteins D, E and F (homologues of thePulD transport family of Klebsiella oxytoca22). The T. maritima type-II secretion pathway probably serves as the primary mechanism forsecretion of degradative enzymes required for the utilization ofpolysaccharides.
Competence has not been demonstrated in T. maritima, but it ispossible that the type-II general secretion pathway proteins D, E and F,and the type IV pilin leader peptidase, type IV pilin-related proteinand pilT identified in T. maritima may function in natural competenceand transformation, as described for other organisms23. In addition, T.maritima has homologues of the competence genes dprA, comM andcomE of Haemophilus influenzae, and comEA and comFC of B. subtilis.There are other protein-coding sequences with weak homology tocompetence genes from B. subtilis. This suggests that there may bean inherent system for the uptake of exogenous DNA, facilitatinggenetic exchange between T. maritima and other organisms.Transcription and translation. Genes encoding the three subunits(a, b, b9) of the core RNA polymerase were identified in T. maritima(rpoA/B/C, respectively) along with four j factors; jA (also namedj70) (rpoD), jE (rpoE), jH (fliA) and j28 (spoOH). Although jA hasbeen previously identified in T. maritima24, the roles and specificityof the remaining j factors in transcription regulation are unknown.The nusA/B/G transcription antitermination genes and a member ofthe greA/B transcription-elongation-factor family were identifiedalong with the rho transcription-termination factor.
The genome of T. maritima is similar to other sequenced bacterialgenomes in that the gene for glutaminyl tRNA-synthetase is miss-ing. An alternative synthesis mechanism is the transamidation ofGlu-tRNAGln to Gln-tRNAGln by Glu-tRNAGln amidotransferase, aheterotrimeric enzyme found in both Eubacteria and Archaea25.Like Helicobacter pylori26, T. maritima lacks an asparaginyl-tRNAsynthetase. Transamidation of Asp-tRNAAsn is presumably involvedin generation of Asn-tRNAAsn, similar to that reported in thearchaeon Haloferax volcanii27.
articles
NATURE | VOL 399 | 27 MAY 1999 | www.nature.com 325
Table 2 T. maritima MSB8 gene list
Gene identificationnumbers that correspond to those in Fig. 2 are listedherewith theprefix TM followed by the commonnameassigned to each protein, the three- or four-letter gene name in parentheses, the organism with the most significant match inbraces and the percent similarity to thebestmatch. Eachgene identified is listed in itsfunctional role category (adopted from ref.10). In cases where the substratespecificity of a protein could not be unambiguously determined, a more generalcommon name was used and no gene name was assigned. In some cases a genewithout known substrate specificity could be confidently assigned to a particularfamily as found in PROSITE (http://expasy.hcuge.ch/sprot/prosite.html/) or SWISS-PROT (http://expasy.hcuge.ch/sprot/). The term ‘related’ is used in two ways in thecommon names: (1) when the TM protein is a partial but significant match to adatabase protein: the TM protein is assigned to the Unknown role category; or (2)when the TM protein is a very good match to a database protein whose function isnot found in T. maritima: the TM protein may be assigned to a role categoryappropriate to the known function of the database match. Abbreviations are asfollows: Common names: AA, amino acid; NH3, ammonia; NH4+, ammonium; AFS,authentic frameshift; APM, authentic point mutation; Bprt, binding protein; BRAA,branched chain AA; Cl−, chloride; CoA, coenzyme A; DHase, dehydrogenase; dep,dependent; elong, elongation; fam, family; flgr, flagellar; init, initiation; Fe, iron; Fe3+,iron(III); Fe2+, iron(II); LPS, lipopolysaccharide; Mg2+, magnesium; Mn2+,manganese; OP, oligopeptide; PPase, phosphatase; P, phosphate; PPR,phosphoribosyl; K+, potassium; prt, protein; put, putative; RDase, reductase; reg,regulation, regulator, regulatory; rel, related; ssDNA, single stranded DNA; Na+,sodium; S, sulfur; sub, subunit; Sase, synthase, synthetase; term, termination; Tase,transferase; transp, transporter; Zn, zinc. Organism of best match: Ac, Acinetobactercalcoaceticus; Ab, Agaricus bisporus; Ar, Agrobacterium radiobacter; At,Agrobacterium tumefaciens; Ae, Alcaligenes eutrophus; Alc, Alcaligenes sp.; Alt,Alteromonas sp.; Amy, Amycolata sp.; Ana, Anabaena sp.; Ath, Anaerocellumthermophilum; Aa, Aquifex aeolicus; Atl, Arabidopsis thaliana; Af, Archaeoglobusfulgidus; Art, Arthrobacter sp.; Aca, Azorhizobium caulinodans; Av, Azotobactervinelandii; Bbv, Bacillus brevis; Bca, Bacillus caldolyticus; Bce, Bacillus cereus; Bci,Bacillus circulans; Bf, Bacillus firmus; Bl, Bacillus licheniformis; Bm, Bacillusmegaterium; Bac, Bacillus sp.; Bsp, Bacillus sphaericus; Bst, Bacillusstearothermophilus; Bs, Bacillus subtilis; T4, Bacteriophage T4; Bn, Bacteroidesnodosus; Bt, Bacteroides thetaiotaomicron; Bv, Beta vulgaris; Bp, Bordetellapertussis; Bb, Borrelia burgdorferi; Bbr, Brevibacillus brevis; Bli, Brevibacteriumlinens; Ba, Brucella abortus; Ce, Caenorhabditis elegans; Cj, Campylobacter jejuni;Ca, Candida albicans; Ch, Capra hircus; Cc, Caulobacter crescentus; Cp, Chlamydiapsittaci; Cr, Chlamydomonas reinhardtii; Cau, Chloroflexus aurantiacus; Cac,Clostridium acetobutylicum; Cla,Clostridium acidiurici; Chi,Clostridium histolyticum;Cpa, Clostridium pasteurianum; Cpe, Clostridium perfringens; Clo, Clostridium sp.;Ct, Clostridium thermocellum; Cam, Cornyebacterium ammoniagenes; Dd,Desulfovibrio desulfuricans; Df, Desulfovibrio fructosovorans; Dg, Desulfovibriogigas; Dv, Desulfovibrio vulgaris; Dt, Dictyoglomus thermophilum; Eco, Eikenellacorrodens; Ea, Enterobacter aerogenes; Ef, Enterococcus faecalis; Ech, Erwiniachrysanthemi; Ec, Escherichia coli; Eac, Eubacterium acidaminophilum; Eg, Euglenagracilis; Fi, Fervidobacterium islandicum; Hi, Haemophilus influenzae; Hp,Helicobacter pylori; Hs, Homo sapiens; Kp, Klebsiella pneumoniae; Lf, Lactobacillusfermentum; Lp, Lactobacillus pentosus; Ll, Lactococcus lactis; Lpn, Legionellapneumophila; Lm, Leptospira meyeri; Lmo, Listeria monocytogenes; Mc,Mesembryanthemum crystallinum; Mta, Methanobacterium thermoautotrophicum;Mtf, Methanobacterium thermoformicicum; Mj, Methanococcus jannaschii; Mk,Methylobacterium extorquens; Mlu, Micrococcus luteus; Ma, Microcystisaeruginosa; Mo, Micromonospora olivasterospora; Mmu, Mus musculus; Ml,Mycobacterium leprae; Ms, Mycobacterium smegmatis; Mt, Mycobacteriumtuberculosis; Mg, Mycoplasma genitalium; Mp, Mycoplasma pneumoniae; Mx,Myxococcus xanthus; Nf,Naegleria fowleri; Ng,Neisseria gonorrhoeae; Nos,Nostocsp.; Pd, Paracoccus denitrificans; Pha, Pasteurella haemolytica; Pan, Podosporaanserina; Pg, Porphyromonas gingivalis; Pm, Propionigenium modestum; Pa,Pseudomonas aeruginosa; Pc, Pseudomonas cichorii; Pf, Pseudomonasfluorescens; Pp, Pseudomonas putida; Pse, Pseudomonas sp.; Ps, Pseudomonasstutzeri; Psy, Pseudomonas syringae; Pfu, Pyrococcus furiosus; Ph, Pyrococcushorikoshii; Pyr, Pyrococcus sp.; Rn,Rattusnorvegicus; Ra,Reclinomonas americana;Rc, Rhodobacter capsulatus; Sc, Saccharomyces cerevisiae; Se,Saccharopolyspora erythraea; Sch, Salmonella choleraesuis; Sd, Shigelladysenteriae; Sa, Staphylococcus aureus; Sx, Staphylococcus xylosus; Sau,Stigmatella aurantiaca; Sb, Streptococcus bovis; Sm, Streptococcus mutans; Sp,Streptococcus pneumoniae; St, Streptococcus thermophilus; Sco, Streptomycescoelicolor; Ss, Sulfolobus solfataricus; Ssc, Sus scrofa; Syn, Synechococcus sp.;SPCC, Synechocystis PCC6803; Scy, Synechocystis sp.; Tb, Thermoanaerobacterbrockii; Tth, Thermoanaerobacterium thermosaccharolyticum; Tl, Thermococcuslitoralis; Tm, Thermotoga maritima; Tn, Thermotoga neapolitana; Ta, Thermusaquaticus; Tt, Thermus thermophilus; Td, Treponema denticola; Tp, Treponemapallidum; Va, Vibrio alginolyticus; Vc, Vibrio cholerae; Vp, Vibrio parachaemolyticus;Ws, Wolinella succinogenes; Xl, Xenopus laevis; Ye, Yersinia enterocolitica..............................................................................................................................................................................
R
© 1999 Macmillan Magazines Ltd
Cell division. The gene content of T. maritima demonstrates thatthe basic mechanism of cell division is similar to that found in otherEubacteria. The ftsA/H/Y/Z genes were identified along with twogenes from the min locus, minC/D. These genes function together tospecify the position and formation of the constricting ring that leadsto final division.Detoxification. Proteins for detoxification in this strict anaerobeinclude two NADH oxidases (nox), a putative alkyl hydroperoxidereductase, a heavy-metal-resistance transcriptional regulator andthe periplasmic divalent cation-tolerance protein (cutA).
Phylogenetics and comparative genomicsThe availability of the complete genome sequence of T. maritima isimportant for evolutionary studies, as T. maritima has been sug-gested to be one of the deepest branching eubacterial species, on thebasis of phylogenetic analysis of SSU rRNA genes3. Although SSUrRNA analysis has been useful in identifying and characterizingbacterial strains and species, many questions remain regarding thevalidity of evolutionary conclusions based on SSU rRNA analysis(see, for example, ref. 28).
To capitalize on the information contained in completed genomesequences, and to reduce complications caused by different speciessets in phylogenetic analyses of different genes29, we identified asubset of 33 genes (see Methods) for which homologues wereconserved in all species sequenced to date4,5,8,11,13,26,30–39 (Table 3).This subset was used to generate multiple sequence alignments andphylogenetic trees using both parsimony and distance methods (seeMethods).
A few significant patterns arise from the analysis of thesephylogenies. First, for the majority of genes, the Archaea constitutea distinct monophyletic group separate from the Eubacteria, aphylogenetic pattern also found in SSU rRNA trees. However, thisfinding does not relate to whether the Archaea as a whole aremonophyletic, as the species of Archaea represented here are allEuryarchaeota. Second, most yeast genes group with the archaealgenes as found for SSU rRNA. In addition, in almost all trees, speciesthat are part of the same bacterial phyla group together (forexample, Escherichia coli and H. influenzae, Borrelia burgdorferiand Treponema pallidum). Beyond this, there are significant differ-ences in the topologies of different genes, and there is little
articles
326 NATURE | VOL 399 | 27 MAY 1999 | www.nature.com
1 6532 4 97 11108
1
2
3
4
5
6
7
8
9
9
suga
r A
BC
tran
spor
t sys
tem
s
livF/G/H/Mbranched chainamino acids
potA/B/C/Dspermidine/putrescine
rbsA/B/C/Dribose ugpA/B/E
glycerol-3-P
33 flagellar &motor genes
GLUCOSE
glucose-6-P
GLY
CO
LYS
IS
PEP
Pyruvate
LACTATEAcetyl-CoA
por
6 phosphogluconate
ENTNERDOUDOROFF
PATHWAY
gly-3-P+
pyruvate
Fructose-6-Pgly-3-P
glycerolglycerol-3-PDHAP
H2 and CO2
GLUCONATE
2-OXOGLUTARATEALDEHYDESKETOISOVALERATE
OR
MALATEchorismate
tryptophantyrosine
oxaloacetate aspartate
threonine
glutamatearginine?prolineglutamine
valine
Ribose-5-P
PRPP
leucine
GlycineAcetamideThreonine
NH3+CO2+H2
ASPARTATE
aamino acids
pstA/B/C/Sphosphate
phosphate ?
chromate ?
malE/F/Gmaltose
glpFglyceroluptake
facilitator
trkA/Hpotassium
feoA/Biron (II)
Mg 2+ ?
iron (III)
znuA/B/Czinc
amtNH4
+
cations
pyrPuracil
H+
ATPsynthase
ADP+ Pi ATP
iron (III) ?
oadA/BNa+
potassium ?
cationsP-type ATPase
7 MCPs
cheA/B/C/D/R/W/Y
chemotacticsignals
histidine
FLAGELLUM
PENTOSE PHOSPHATEPATHWAY
?
?? ?
KDPG
?
?
antibiotics ?malE/F/Gmaltose
PPi 2Pi
??
serine
cysteine
glycine
??
H+
pyro-phosphatase
11 oligopeptide ABC transport systems
Figure 3 Overview of metabolism and transport in T. maritima MSB8. Pathways
for energy production and the metabolism of organic compounds, acids and
aldehydes are shown. Each gene product with a predicted function in ion or
solute transport is illustrated. Transporters are grouped by substrate specificity
according to role category: cations (green), anions (red), carbohydrates
(yellow), purines (purple), amino acids/peptides/amines (dark blue) and other
(light blue). Question marksassociated with transporters indicate uncertainties in
substrate specificity or direction of transport. Question marks associated with
metabolic pathways indicate where an expected activity was not found.
Permeases are represented by ovals. ABC transport systems are shown as
composite figures of ovals, diamonds and circles. All other transporters are
drawn as rectangles. Export or import of solutes is designated by the direction of
the arrows through the transporter. If precise substrate specificity could not be
determined for a transporter, no gene name was assigned and a more general
common name reflecting the type of substrate being transported was used.
Abbreviations: PRPP, phosphoribosyl-pyrophosphate; gly, glyceraldehyde; PEP,
phosphoenolpyruvate; ATP, adenosine triphosphate; ADP, adenosine diphos-
phate; DHAP, dihydroxyacetone phosphate; OR, oxidoreductases; MCP, methyl-
accepting chemotaxis protein; KDPG, 2-keto-3-deoxy-6-phosphogluconate; por,
pyruvate oxidoreductase.
© 1999 Macmillan Magazines Ltd
agreement between these gene trees and the rRNA tree. In particular,we find little support for the rRNA-based positions of Aquifex andThermotoga.
The lack of congruence of trees for different genes could resultfrom the limited number of species represented by the completedgenomes and/or the small size of some of these genes. However, webelieve that the differences are real both because of high bootstrapsupport and because differences in topology have been found byothers in the analysis of other genes28. Mechanisms that could leadto these differences include gene duplication, gene loss and hori-zontal gene transfer. Thus we conclude that, based on single-geneanalysis, the phylogenetic position of Aquifex and Thermotoga, andthe nature of the deepest branching eubacterial species, should beconsidered to be ambiguous.
Phylogenetic analysis of individual genes has limited evolution-ary studies to a small percentage of the genes in any given genome.As an alternative to single-gene phylogenetic analysis, we comparedT. maritima’s genome sequence to those of the completelysequenced microbial species by observing patterns of similarity(see Methods). Of the 1,877 predicted coding sequences in theT. maritima genome, 52% are most similar to proteins in eubacterialspecies, most to the Gram-positive bacterium B. subtilis (21%) andto A. aeolicus (15%). In addition, 24% of the predicted codingsequences are most similar to proteins in archaeal species (Table 4),
almost half to P. horikoshii. This similarity of T. maritima to theArchaea contrasts with the other Eubacteria in which no more than16% of the coding sequences in A. aeolicus, and 7% of the codingsequences in B. subtilis, are most similar to archaeal proteins. Bywhole-genome similarity comparison, T. maritima appears to be themost Archaea-like of all sequenced Eubacteria4,11,26,31–38.
The Archaea-like nature of the T. maritima genome does notnecessarily reflect a closely shared common ancestor betweenT. maritima and the Archaea, as the extensive similarity betweenthese species could have arisen in many ways. These include loss ofthese genes in other lineages, extensive sequence divergence inmesophilic Eubacteria (coupled with a retention of such genes inthe thermophilic Archaea and T. maritima) and, alternatively, that afew genes in the T. maritima genome with strong similarity toarchaeal genes have expanded into large gene families, resulting inmore genes with a higher level of similarity to archaeal genes.Such mechanisms could lead to similarity between Archaea andT. maritima regardless of the evolutionary history of these species.
We believe, however, that much of the similarity betweenT. maritima and the Archaea is due to shared ancestry of portionsof the genome as a result of extensive lateral gene transfer betweenthese lineages (Tables 3, 4). One line of evidence consistent withprevious studies which have argued in support of lateral transfer40 isthat the 451 Archaea-like genes in T. maritima are not uniformlydistributed among the biological role categories (Table 4). Themajority of genes involved in housekeeping functions such astranscription, translation, DNA replication and cell division aremost similar to orthologues in eubacterial species. In contrast, 49%of transporters (92), 60% of electron transport proteins (28) and42% of conserved hypothetical proteins (173) are most similar toarchaeal genes. Another observation which would support lateralgene transfer is that 81 of the Archaea-like genes in the T. maritimagenome are clustered in 15 regions of the chromosome that range insize from ,4 to 20 kb. The genes, and conservation of gene order inseven of these regions, have only been described in the genomesequences of thermophilic Archaea. In addition, two of the clusteredregions are associated with the 30-bp repeat elements found amongArchaea and T. maritima, lending support to the idea that theserepeat elements may be involved in gene transfer.
x2-analysis of the genome sequence (see Methods) lends addi-tional support to the theory of lateral gene transfer in T. maritima.Based on the assumption that the DNA composition is relativelyuniform throughout the genome, there are at least 51 regions in thechromosome (Fig. 1) that have a significantly different composition(P # 1:9 3 10 2 9). Forty-two of these regions include genes andrepeat structures that have highest levels of similarity to regions onthe chromosomes of other thermophiles, including the thermo-philic Archaea and Aquifex. All of the 30-bp small repeat areas have ax2 composition that is substantially different from the rest of the
articles
NATURE | VOL 399 | 27 MAY 1999 | www.nature.com 327
OligopeptidesTM0531
Oligopeptides TM0056 Oligopeptides
TM0309OligopeptidesTM1746
OligopeptidesTM1150
OligopeptidesTM0300
OligopeptidesTM0071
Oligopeptides TM1199
OligopeptidesTM0031
OligopeptidesTM1223
OligopeptidesTM1226
RiboseTM0958Sugar
TM0114IronTM0189
IronTM0080
Branchedchainamino acidsTM1135
PhosphateTM1264
Spermidine/putrescineTM1375
SugarTM0418
SugarTM0277
MaltoseTM1839
MaltoseTM1204
SugarTM1855
SugarTM0595
Gly-3-PTM1120
ZincTM0123
SugarTM0432
SugarTM0810
Amino acidsTM0593 Oligopeptides
TM1067
Figure 4 Phylogenetic pattern of the periplasmic SBP component of the
oligopeptide transporters and other transporters present on the T. maritima
MSB8 genome. Sperm/putre, spermidine/putrescine; Gly-3-P, glycerol-3-phos-
phate; sugar, unsure of specificity of sugar transporter.
Table 3 Genes shared by the T. maritima MSB8 genome
33 homologues in the completed genomesapt, argS, eno, ftsZ, gcp, gltX, groES, hisT, pheS, pgk, prs rplA, rplB, rplC, rplE, rplF,rplK, rplN, rplR, rplV, rpsB, rpsC, rpsD, rpsE, rpsG, rpsH, rpsJ, rpsK, rpsL, rpsM, rpsQ,rpS, secY..............................................................................................................................................................................
T. maritima genes matching only hyperthermophiles:108 total; 93 hypothetical proteins; flgA putative; s-layer-related protein; rubrerythrinputative; 2 ABC transporters; glutamate synthase-related protein; glutaredoxinputative; putative glycerate kinase; putative hydrogenase; putative NADHdehydrogenase; putative LPS biosynthesis protein; putative phosphonopyruvatedecarboxylase; putative pyruvate formate lyase activating enzyme; alanineacetyltransferase-related protein; sensory box protein..............................................................................................................................................................................
T. maritima genes matching only Archaea71 total; 64 hypothetical proteins; 2 ABC transporters; glutamate synthase-relatedprotein; putative glycerate kinase; putative hydrogenase; rubrerythrin; sensory boxprotein..............................................................................................................................................................................
Table 4 Top eubacterial or archaeal match in T. maritima by role ID
Role category Eubacteria Archaea Eukaryotes None.............................................................................................................................................................................
Amino acid biosynthesis 49 20 2 1Purines, pyrimidines, etc. 32 11 2 0Fatty acid and phospholipid metab. 12 3 0 0Biosynthesis of cofactors etc. 22 10 0 0Central intermediary metabolism 27 12 1 3Autotrophic metabolism 0 0 0 0Energy metabolism 117 56 1 18Transport 89 92 0 7DNA metabolism 45 6 0 2Transcription 17 0 0 0Translation 118 6 0 6Regulatory functions 61 9 0 0Cell envelope 57 9 1 7Cellular processes 55 10 0 0Other 9 8 0 1Hypotheticals 213 173 2 19Unknown 52 26 0 5
TOTAL 975 451 9 69.............................................................................................................................................................................
© 1999 Macmillan Magazines Ltd
genome. Compositional bias as seen in these x2 distributions haspreviously been used in support of lateral gene transfer41.
In summary, completion of the sequence of the T. maritimagenome has revealed a degree of similarity with the Archaea in termsof gene content and overall genome organization that was notpreviously appreciated. Although the core of T. maritima may beeubacterial, almost one quarter of the genome is archaeal in nature.The mosaic nature of T. maritima appears to be the result ofextensive lateral gene transfer with the Archaea. The accumulatingevidence for the high frequency of lateral transfer events, and thelack of congruence among the phylogenies of different genes,indicates that the emphasis of phylogenetic studies on individualgenes as indicators of organismal evolution is probably not accurate.Undoubtedly, our understanding of the complex relationshipsamong prokaryotes will continue to increase as other microbialgenome sequencing projects are completed. M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Methods
Whole-genome random sequencing procedure. The type strain,T. maritima MSB8, was grown from a culture derived from a single cellisolated by optical tweezers and provided by R. Huber (University ofRegensburg). Cloning, sequencing and assembly were as described for genomessequenced by TIGR4,5. One small-insert plasmid library (1.5–2.5 kb) wasgenerated by random mechanical shearing of genomic DNA. One large-insert lambda library was generated by partial Tsp5091 digestion and ligationto l-DASHII/EcoRI vector (Stratagene). In the initial random sequencingphase, ,7-fold sequence coverage was achieved with 27,789 sequences fromplasmid clones (average read length, 531 bases). The plasmid and l-sequenceswere jointly assembled using TIGR Assembler42. Sequences from both ends of546 l-clones served as a genome scaffold, verifying the orientation, order andintegrity of the contigs. Sequence gaps were closed by editing the ends ofsequence traces and/or primer walking on plasmid clones. Physical gaps wereclosed by direct sequencing off genomic DNA, or combinatorial PCR followedby sequencing of the PCR product. The final genome sequence is based on30,140 sequences.
Paralogous gene families were constructed by searching the ORFs againstthemselves using BLASTX43, identifying pairwise matches above P # 102 5 over60% of the query search length, and subsequently clustering these matches intomultigene families. Multiple sequence alignments for these protein familieswere generated using MSA (G. G. Sutton & T. Bussey, personal communica-tion), an annealing algorithm, and the alignments were scrutinized.
For the comparative genomics, the T. maritima ORFs were added to a set ofall ORFs from 16 published microbial genomes4,5,8,11,13,26,30–39. This dataset wassearched against itself using BLASTX43, and pair-wise matches were identified,clustered and converted to multiple sequence alignments as above. From thisset of alignments a subset of 33 homologous gene families (P # 10 2 5 over 60%of the query search length) were identified. These alignments were enhancedusing the CLUSTAL X program, and regions of the alignments that wereambiguous, hypervariable or that contained large alignment gaps wereexcluded from subsequent phylogenetic analysis. Phylogenetic trees weregenerated from the curated alignments using the PAUP 4.0.0d64 andPHYLIP computer programs. Parsimony analysis was conducted using theheuristic search algorithm of PAUP and protpars of PHYLIP. Distance-basedtrees were generated using the neighbour-joining algorithm of ref. 44. Onehundred bootstrap replicates were conducted for all trees. For all PAUPanalyses, multiple step matrices were used in calculation of distances and inparsimony analysis. The whole genome set of pairwise BLASTX43 search resultswas also used to determine ‘best hits’ of T. maritima and A. aeolicus to othergenomes.
For the x2 analysis we computed the distribution of all 64 trinucleotides(3mers) for the complete genome, and then computed the 3mer distribution in2,000 bp windows across the genome. We used windows that overlapped by halftheir length; that is, 1,000 bp. For each window, we computed the x2 statistic onthe difference between its 3mer content and that of the whole genome. A largevalue of this statistic means that the composition within the window is differentfrom the rest of the genome. Figure 1 illustrates those regions of the genomewith x2 values whose probability is less than 1:9 3 10 2 9; the probability valuesfor these regions are based on the assumption that the DNA composition isrelatively uniform throughout the genome. Because this assumption may beincorrect, we prefer to interpret high x2 values merely as indicators of regionson the chromosome that appear unusual and that demand further scrutiny.ORF prediction and gene family identification. An initial set of ORFs likelyto encode proteins was identified by GLIMMER9, and those shorter than 30codons were eliminated. ORFs that overlapped were visually inspected, and insome cases removed. ORFs were searched against a non-redundant proteindatabase as previously described4. Frameshifts and point mutations weredetected and corrected where appropriate as described previously26. Remainingframeshifts and point mutations are considered to be authentic and corre-sponding regions were annotated as ‘authentic frameshift’ or ‘authentic pointmutation’ respectively. Two sets of hidden Markov models (HMMs) were usedto determine ORF membership in families and superfamilies. These included527 HMMs from pfam v2.0, and 199 HMMs from the TIGR orthologue
articles
328 NATURE | VOL 399 | 27 MAY 1999 | www.nature.com
xylQ308
1746 1747 1748 1749 1750 1751 1752 1753
bglB25
lamA24 26 27 28 29 30 31 32 33 34
galK1190
galA1192
galT1191
malG1202
malF1203
malE12041193 1194 1195 1196 1197 1198 1199 1200 1201
1062 1068 10711069 1072 10731063 10671065 10661064 1070
xynA61
aguA55
feoB51
feoA50
uxuA69
xynB70
xloA76
axeA7752 6653 54 56 57 58 59 60 62 63 64 65 67 68 71 72 73 74 75 78 79 80
309296 305299 306297 298 302 303301300 304 307 310
1148 1149 1150 1151 1152 1153 1154 1155
1218 1219 1220 1221 1222 1223 1224 1225 1226 1227
Figure 5 Linear representation of the location of nine oligopeptide transporter
operons on the T. maritima MSB8 genome. Arrows represent functional
categories as follows: black, sugar metabolism; red, transporters; orange,
regulators; blue, hypothetical; yellow, conserved hypothetical; grey, other. aguA,
a-glucuronidase; axeA, acetyl xylan esterase; bglB, b-glucosidase; feoA, iron(II)
transport protein A; feoB, iron(II) transport protein B; galA, a-galactosidase; galK,
galactokinase; galT, galactose-l-phosphate uridylyltransferase; lamA, laminari-
nase; malE/F/G, maltose ABC transporter; uxuA, D-mannonate hydrolase; xloA,
xylosidase; xylQ, a-xylosidase; xynA, endo-1,4-b-xylanase A; xynB, endo-1,4-b-
xylanase B. Arrows are not proportional to the size of predicted-coding regions.
© 1999 Macmillan Magazines Ltd
resource. TopPred45 was used to identify membrane-spanning domains(MSDs) in proteins.
Received 9 November 1998; accepted 31 March 1999.
1. Huber, R. et al. Thermotoga maritima sp. nov. represents a new genus of unique extremelythermophilic eubacteria growing up to 90 8C. Arch. Microbiol. 144, 324–333 (1986).
2. Huber, R. & Stetter, K. O. in The Prokaryotes (eds Balows, A. et al.) 3809–3815 (Springer, Berlin,Heidelberg, New York, 1992).
3. Achenbach-Richter, L., Gupta, R., Stetter, K. O. & Woese, C. R. Were the original eubacteriathermophiles? Syst. Appl. Microbiol. 9, 34–39 (1987).
4. Fleischmann, R. D. et al. Whole-genome random sequence of Haemophilus influenzae Rd. Science 269,496–512 (1995).
5. Klenk, H.-P. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducingarchaeon Archaeoglobus fulgidus. Nature 390, 364–370 (1997).
6. Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13,660–665 (1996).
7. Salzberg, S., Salzberg, A., Kerlavage, A. & Tomb, J.-F. Skewed oligomers and origins of replication.Gene 217, 57–67 (1998).
8. Bult, C. J. et al. Complete genome sequence of the methanogenic archaeon Methanococcus jannaschii.Science 273, 1058–1073 (1996).
9. Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identification using interpolatedMarkov models. Nucleic Acids Res. 15, 544–548 (1998).
10. Riley, M. Functions of gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993).11. Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature
392, 353–358 (1998).12. Saier, M. H. Jr Computer-aided analyses of transport protein sequences: gleaning evidence concerning
function, structure, biogenesis, and evolution. Microbiol. Rev. 58, 71–93 (1994).13. Kawarabayasi, Y. et al. Complete sequence and gene organization of the genome of a hyperthermo-
philic archaebacterium, Pyrococcus horikoshii OT3. DNA Res. 5, 55–76 (1998).14. Boos, W. & Lucht, J. M. in Escherichia coli and Salmonella Cellular and Molecular Biology (eds
Neidhardt, F. C. et al.) 1175–1209 (ASM, Washington, 1996).15. Bronnenmeier, K., Kern, A., Liebl, W. & Staudenbauer, W. L. Purification of Thermotoga maritima
enzymes for the degradation of cellulosic materials. Appl. Environ. Microbiol. 61, 1399–1407 (1995).16. Felix, C. R. & Ljungdahl, L. O. The cellulosome; the exocellular organelle of Clostridium. Annu. Rev.
Microbiol. 47, 791–819 (1993).17. Vargas, M., Kashefi, K., Blunt-Harris, E. L. & Lovley, D. R. Microbiological evidence for Fe(III)
reduction on early Earth. Nature 395, 65–67 (1998).18. Janssen, P. H. & Morgan, H. W. Heterotrophic sulfur reduction by Thermotoga sp. strain FjSS3.B1.
FEMS Microbiol. Lett. 75, 213–217 (1992).19. Gluch, M. F., Typke, D. & Baumeister, W. Motility and thermotactic responses of Thermotoga
maritima. J. Bacteriol. 177, 5473–5479 (1995).20. Lee, P. J. & Stock, A. M. Characterization of the genes and proteins of a two-component system from
the hyperthermophilic bacterium Thermotoga maritima. J. Bacteriol. 178, 5579–5585 (1996).21. Macnab, R. M. in Escherichia coli and Salmonella Cellular and Molecular Biology (eds Neidhardt, F. C.
et al.) 123–145 (ASM, Washington, 1996).22. Hueck, C. J. Type III protein secretion systems in bacterial pathogens of animals and plants. Microbiol.
Mol. Biol. Rev. 62, 379–433 (1998).23. Dubnau, D. Binding and transport of transforming DNA by Bacillus subtilis: the role of type IV pilin-
like proteins—a review. Gene 11, 191–198 (1997).24. Gruber, T. M. & Bryant, D. A. Molecular systematic studies of eubacteria, using sigma70-type sigma
factors of group 1 and group 2. J. Bacteriol. 179, 1734–1747 (1997).
25. Zhang, J., Hardham, J., Barbour, A. & Norris, S. Antigenic variation in Lyme disease Borreliae bypromiscuous recombination of VMP-like sequence cassettes. Cell 89, 275–285 (1997).
26. Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Science388, 539–547 (1997).
27. Curnow, A. W., Ibba, M. & Soll, D. tRNA-dependent asparagine formation. Nature 382, 589–590(1996).
28. Brown, J. R. & Doolittle, W. F. Archaea and the prokaryote-to-eukaryote transition. Microbiol. Mol.Biol. Rev. 61, 456–502 (1997).
29. Eisen, J. A. The RecA protein as a model molecule for molecular systematic studies of bacteria:comparison of trees of RecAs and 16S rRNAs from the same species. J. Mol. Evol. 41, 1105–1123(1995).
30. Smith, D. R. et al. Complete genome sequence of Methanobacterium thermoautotrophicumDH:Functional analysis and comparative genomics. J. Bacteriol. 179, 7135–7155 (1997).
31. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature390, 249–256 (1997).
32. Fraser, C. M. et al. Genomic sequence of a Lyme disease spirochete, Borrelia burgdorferi. Nature 390,580–586 (1997).
33. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462(1997).
34. Cole, S. T. et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genomesequence. Nature 11, 537–544 (1998).
35. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403(1995).
36. Himmelreich, R. et al. Complete sequence analysis of the genome of the bacterium Mycoplasmapneumoniae. Nucleic Acids Res. 24, 4420–4449 (1996).
37. Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp.strain PCC6803. II. Sequence determination of the entire genome and assignment of potentialprotein-coding regions. DNA Res. (suppl.) 3, 185–209 (1996).
38. Fraser, C. M. et al. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science281, 375–388 (1998).
39. Goffeau, A. et al. Life with 6000 genes. Science 274, 563–567 (1996).40. Huang, Y.-P. & Ito, J. The hyperthermophilic bacterium Thermotoga maritima has two different classes
of family C DNA polymerases: evolutionary implications. Nucleic Acids Res. 26, 5300–5309 (1998).41. Lawrence, J. G. & Ochman, H. Amelioration of bacterial genomes: rates of change and exchange. J.
Mol. Evol. 44, 383–397 (1997).42. Sutton, G. G., White, O., Adams, M. D. & Kerlavage, A. R. TIGR Assembler: A new tool for assembling
large shotgun sequencing projects. Genome Seq. Technol. 1, 9–19 (1995).43. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J.
Mol. Biol. 215, 403–410 (1990).44. Saitou, N. & Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic
trees. Mol. Biol. Evol. 4, 406–425 (1987).45. Claros, M. G. & von Heijne, G. TopPred II: an improved software for membrane protein structure
predictions. Comput. Appl. Biosci. 10, 685–686 (1994).
Acknowledgements. This work was supported by the US Department of Energy, Office of Biological andEnvironmental Research. We thank M. Heaney, J. Scott, D. Maas and B. Vincent for software and databasesupport; R. Roberts, F. Kunst, and M. Simon for useful discussions; and R. Huber for providingT. maritima MSB8 cells.
Correspondence and requests for materials should be addressed to C.M.F. (e-mail: [email protected]). Theannotated genome sequence and the gene family alignments are available on the World-Wide Web athttp://www.tigr.org/tdb/mdb/. The sequence has been deposited in GenBank with accession numberAE000512.
articles
NATURE | VOL 399 | 27 MAY 1999 | www.nature.com 329
?
61
6
98
59
104
52
44
32
152
aa
25
160
110
174
547
19
16
3613
16
64
ant?
5
98
74
118
198
?5
19
8375
66
178
192
bcaa
bcaa
bcaa
7bc
aa8
1180
61
7
153
1320
0
179
638
129
189
37
5
27
187
7
78
64
34
102
39
79
?
130
93
179
64
6
67
6
6g3
Pi
42
93
46
16
57
8
12
11
11
158
27
165
8
5
140
9
64
102
45
46
77
119
35
40
164
16
67
45
148
6
34
11
34
108
57
46
143
33
119
14
46
46
16
5?
187
185
64
22
71
kb
Aut
hent
ic F
ram
e S
hift
Par
alog
ous
gene
fam
ily
GE
S r
egio
n
Rep
eat R
egio
n
Arc
haea
l Isl
ands
Tra
nspo
rter
LPLi
popr
otei
n
Sig
nal p
eptid
e
94
9
Fe+
++
5s r
RN
A
16s
rRN
A
23s
rRN
A
4
5S
23S
3
6
103
178
LR7
164
1
2 135
154
10
1
976
53
20
RP
T5A
LR1
100
101
1320
1
42
163
130
168
142
123
aaX
aaX
170
83
aaX 20
5
20
CO
H?
67
5?
rib
76
4242
67
4242
CO
H
67
426
642
42
642
56
?
42
42
42
4444
445
CO
H
193
141
48
160
25
67
11
27
108
206
17
5C
OH
27
117
191
145
16
189
34
5C
OH
41C
OH
?
CO
H
144
83
67
46
24
46
41C
OH
?
Zn
42
149
193
4
145
199
63
208
15K
+?
92
17
113
1
142
60C
OH
190
41C
OH
?
CO
H6
88
186
1
111
31
17
31
s/p
46
72
45
46
1
41C
OH
66
5P
i
108
60
46
73
5s/
p
162
71
68
71
22
164
180
Pi?
11
133
4
60rib
5061
16
36
61
25
74
16
132
4
45
106
23
67
CO
H9
150
27
13
?12
66
42C
OH
103
70
117
7?
11
7?
11
71
199
127
69
4680
16
CO
H7
CO
H6
46
46
190
197
191
71
207
48
rib
164
99
30
60
3?
10
114
2450
51
rib
6
7
5Z
n
164
63
67
806
6
CO
H
162
31
44
162
CO
H8
16
71
29
61
46
113
4
5rib
5
42C
OH
174
16
42C
OH
9
206
CO
H
Pi
88
5
180
Pi?
9
CO
H
?
199
164
209
209
145
124
48
1
CO
H
184
6
176
8
H+
16
71
6
72
4
46
16
87?
10
60
187
?
46
4649
s/p
169
200
ura
11
117
CO
H9
60
4
167
87
28
?11
9
43
67
42
4
74
12
43
61
61
60
5C
OH
50
41C
OH
?
CO
H6
201
6
50
4
177
9
108
41s/
p
Zn
8
60
209
49
127
23
71P
i
61
169
27
92
106
153
6
9
4
88
207
46
24
aaX
aaX
aaX
aaX
136
136
aaX 20
aaX
aaX
544
Fe3
+?
20
20
520
6
70
7
20aa
X
56
620
140
5
44aa
Xaa
X 5
42
20aa
X
aaX
94
?
5aa
X
42m
al
192
132
6
5aa
X
5
7
42
aaX
6
?
204
132
203
45
5aa
X
5
141
109
aaX
67F
e2+
7
64
76
166
125
20aa
X6
11
134
78
21aa
X
16
21aa
X?
5
42m
al8
104
20aa
X
?
6
21aa
X?
174
66
21aa
X
42
120
21aa
X
48
5
14
64
91
6410
202
204
82
aaX
6
Na+
122
109
202
21aa
X
125
5
133
34
5aa
X
205
159
20aa
X17
0
7
45
194
80
183
12
64
41
84
98
mal
5518
7
20
51
192
aaX
6
64
45
21aa
X
21aa
X
183
188
1664
5
42
5?
90
194
16
20aa
X
46
6
42
42
42
5aa
X
16
107
5aa
X
4
Na+
16
115
9
21aa
X69
6
16
157
Fe3
+?
29
42
85
42m
al
42
20aa
X6
79
5aa
X
76
5aa
X
Fe3
+9
42
1811
20
aaX
6
42
11
42
112
116
6
6
16F
e2+
29
6
85
67
123
8
210
46
188
5F
e3+
154
208
20aa
X6
139
7
aaX
6
65
49
8
5aa
X
6
46
42m
al6
16
64
10
75
197
21aa
X?
120
29
5
31
117
5
11
82
163
16
90
12
?
20aa
X
aaX
6
5
8
aaX
6
?
6
6
86
5aa
X
46
102
5aa
X
42
134
20aa
X6
58
177
9
5aa
X
13
34
64
21aa
X
52
159
41m
al
42
186
42
55
Fe3
+?
9
9
42
203
21
5
aaX
20
K+
aaX
6
71
5aa
X
157
Fe3
+?
10
123
21aa
X?
148
148
6g3
Pi
NH
4+
64
16
210
710
89
25
12
glyc
7
27
81
116
16
16
138
9
205
120
337
6
24
39
14
181
173
161
105
16
149
45
51
24
34
124
81
182
53
51
13
38
93
16
195
162
49
6
68
6
46
132
86
187
711
166
41g3
Pi
16
34
158
14
146
158
16
34
37
115
126
175
8
45
173
29
122
170
91
111
105
77
68
198
26
56
52
45
175
8
96
148
62
2
25
10
336
89
39
26
93
16
156
128
31
131
466
42
336
64
154
tRN
A16
121
115
34
98
16S
155
16
152
307
27
7
29
46
131
5
94
1
101
107
64
13
9
1613
5
174
6215
1
161
24
146
46
74
467
182
9
102
73
78
46
336
6
104
16
26
16
25
56
34
47
102
336
33
48
336
130
35
108
81
41
5?
48
110
40
111
25
44
129
124
124
46
33an
t?6
42
201
137
155
22
126
34
162
11
144
166
67
210
52
8
109
65
109
6
147
chro
?
47
98
101
145
19
5aa
196
192
184
83
66
5?
67
5?
80+
?12
34
137
64
31
29
161
99
156
32
5?
16
16
46
176
8
64
181
143
46
102
81
100
25
29
112
84
185
6
161
10
8
1812
196
111
bcaa
165
172
8
94
8
5?
6
128
1K
+?
6
712
11
5?
1811
26
102
95
121
5?
9
CO
H?
5?
98
67
118
132
22
16
132
96
34
153
50
16
168
32
172
172
5?
166
42
22
11
171
?
46?
166
42
5?
16
25
16
5?
126
6
Pi
83
978
6
4213
810
42
5
59
95
195
+
7
CO
H42
28
5?
144
713
9
45M
g2+
?
185
8
44
42
?6
161
5
11
5?
50
141
74
25
46aa
151
73
141
15K
+
83
+6
150
114
5?
155
147
chro
?
129
67
107
58
16
113
3613
16
171
?
167
rnpB
TM
rrna
A5S
TM
rrna
A16
ST
Mrr
naA
23S
1
32
1
1
21
12
11
1
1
1
15
1
1
12
LR4
LR3
LR8
RP
T1A
RP
T1B
RP
T2B
RP
T3A
RP
T3B
RP
T3C
RP
T3D
RP
T4A
RP
T4B
RP
T6A
RP
T6B
RP
T6C
RP
T7A
RP
T7B
RP
T8A
RP
T8B
LR2
68
1215
482
258
485
11
1239
474
282
704
1105
1709
201
308
1384
259
1441
1240
1618
1818
1318
20
1096
1732
1765
1619
903
702
872
1238
1632
119
600
317
1455
1088
1730
1269
1550
723
593
691
645
1277
1564
771
1169
1328
1053
117
730
747
882
1139
820
1029
1007
412
1731
1846
1161
384
912
893
274
1028
1472
245
538
1692
1792
310
248
1701
1719
695
549
1390
824
422
1714
1536
1584
643
155
288
379
1158
438
1638
1857
96
1777
236
1170
203
1131
785
683
1794
1834
1182
237
222
579
1400
548
209
444
662
1686
228
1724
939
371
1124
25
1537
1557
1697
1650
1425
1281
364
916
22
1523
98
1633
580
483
232
262
724
194
1201
1772
1371
759
211
814
1076
1060
1658
1368
573
313
1216
4355
1528
1369
1187
289
157
1725
1135
1612
563
1213
1038
1214
285
1703
298
1321
42
1017
558
233
700
395
1426
1327
831
1052
1781
459
799
1094
1178
77
1401
1674
111
1465
791
728
1726
1825
1168
1683
1862
372
352
1446
1302
1237
1317
292
576
553
591
671
1624
1770
559
365
873
1162
1819
1616
1598
1851
257
902
833
1034
1177
1509
1787
1382
1466
234
1004
1756
1255
1796
807
515
752
1848
67
1257
673
1511
1874
367
792
1636
1404
1607
570
1717
729
904
349
1137
1014
1820
1610
224
88
922
1693
1468
39
1164
204
1759
1078
1629
1867
709
1100
435
1253
1589
1643
1760
66
386
402
1414
172
1089
1138
97
1192
1519
1256
1609
827
547
1396
41
1447
306
1708
147
281
1093
472
1859
1432
1231
226
156
803
803
227
1424
1631
1357
165
343
925
1319
291
359
461
983
1129
1868
1733
290
1391
802
1563
1181
1570
268
1452
556
1313
453
554
1861
1646
1741
513
582
129
534
836
1582
1449
924
462
1286
1310
1648
1715
1780
231
574
458
1451
178568
6
1878
765
1642
1840
496
389
608
247
142
466
1278
1442
1784
1165
322
398
555
705
1685
69
1087
21
1863
1212
1383
434
160
464
1084
1398
1247
1778
1513
205
1595
307
841
1002
825
1754
1042
1518
1295
1698
188
80014
1
508
1324
36
1687
544
102
527
276
1663
38
763
1301
197
199
509
1744
175
926
552
470
920
1508
1068
1611
5
1771
1869
436
426
1417
287
1211
100
383
780
10
121
1062
1479
1349
336
250
264
1728
1608
108
183
592
511
416
13
1095
1136
1254
1394
1427
1195
198
225
266
417
1826
1558
1797
1847
793
345
703
12
557
624
1403
146
835
1782
801
353
660
968
1707
1054
1791
176
1193
1325
347
1283
1615
1613
1614
1294
988
1530
1144
195
177
1118
1469
1167
164
809
273
1385
1721
796
594
1112
1112
1207
914
1110
1393
680
366
519
1664
1358
414
571
193
136
392
770
1205
865
239
1155
1114
1679
874
1816
1620
616
894
260
681
149
1437
81
756
350
1280
445
378
1018
864
1121
1542
1163
1640
221
493
524
804
1191
220
481
1142
1045
1768
1660
1576
997
1270
505
82
938
564
1735
1723
1036
1695
1710
990
240
663
561
1000
587
330
541
885
1012
122
1588
1459
152411
33
1210
1117
772
1019
1464
1567
665
1593
526
335
333
847
667
1140
866
1290
1141
1601
699
324
394
664
830
1753
1506
1040
896
369
1145
1752
1366
1083
486
388
848
70
1525
1049
1682
1788
293
879
1811
934
1282
106
409
675
305
1547
731
1540
632
1043
964
851
1522
1635
908
1322
1229
895
1418
1594
1860
1814
171
1751
655
263
744
270
1516
1265
1235
1761
1672
1104
652
380
551
1521
1326
1041
1303
986
207
1273
337
545
1075
1061
1217
1520
1876
888
397
1380
169
323
735
910
1051
1304
385
1699
1288
243
779
889
862
687
1173
286
217
1467
647
877
877
373
1678
849
1430
497
1852
223
698
909
927
852
982
1766
1125
1159
1773
144
442
569
1416
374
1127
919
930
480
1023
334
979
1179
216
844
890
1147
1126
316
719
679
843
1738
1293
413
358
584
790
755
1024
1740
1532
1691
296
522
182
748
1776
1573
1421
1783
1720
61
618
1122
529
972
1639
1364
212
1799
196
521
1156
1188
1090
1627
1822
1551
1581
219
797
1016
1058
1545
1543
1606
1101
1113
1011
1015
1092
1443
1351
387
1070
1722
473
1119
94
1225
1823
1388
53
697
1585
1585
1877
424
362
935
648
640
883
83
1190
1272
230
577
148
8
1287
1431
218
837
1157
370
1027
943
351
161
586
213
845
229
1515
540
962
1103
1091
1074
1533
758
891
1803
1048
132
1821
653
906
1120
850
506
166
130
312
656
1835
37
1098
1769
575
494
487
813
376
876
381
1010
688
1553
252
674
868
1209
1046
1641
998
708
937
1798
265
936
1086
1332
907
1334
1106
678
410
952
363
978
651
829
404
1013
1116
159
1175
1436
423
214
734
1050
1865
1115
443
981
633
1559
1134
543
846
1410
1160
95
690
815
1541
1539
1227
407
244
1517
1314
1675
1128
932
488
1429
1180
87
1055
1123
437
151
712
1387
989
987
1808
560
1690
692
881
1020
1531
966
200
1809
1031
1875
368
1022
1438
768
40
1807
320
1411
613
294
619
253
1111
1109
1174
1208
1289
1365
1448
140
56
1548
189
500
52
1665
1198
309
834
669
23
1549
190
639
1189
973
951
711
950
1204
1700
1223
1405
650
476
280
884
1395
806
181
1460
1064
1849
1739
542
682
974
1008
1617
805
489
854
24
992
504
782
391
448
89
1597
1065
993
1457
477
1152
739
1219
646
1330
1748
1630
773
60
1775
1562
1747
1810
1153
794
1353
396
202
838
4445
185
1329
133
1315
1408
562
1750
1841
1361
9
1252
246
300
210
572
905
523
340
713
179
732
1600
361
8491
1622
634
623
818
1634
786
607
1291
1202
1805
1833
1767
1802
764
1151
354
599
610
405
933
532
1529
1415
985
26
1843
1704
1676
503
241
535
725
867
1716
1461
789
1681
78
947
360
28
1795
1556
1813
1348
101
941
331
440
566
630
1623
948
63
1407
1323
62
1381
581
900
479
152
743
1085
50
1299
390
49
301
1275
1873
1033
14
1402
256
1842
676
757
995
469
72
1236
737
1804
778
597
761
1274
918
79
1305
1196
85
450
977
1284
304
1800
1197
1801
1143
583
1844
672
1652
1837
1423
1333
975
463
338
168
512
798
636
80
1420
6
357
1864
528
531
589
880
499
75
1727
58
517
1705
736
242
1194
568
871
644
929
1838
154
626
2
90
1583
1866
1372
1222
899
34
510
1309
1266
1439
329
1146
54
501
760
745
754
1599
491
913
536
961
931
31
1352
1746
1347
1645
206
923
627
1514
59
1082
1373
1307
585
375
781
615
180
1285
1839
622
944
817
684
1647
1337
415
602
738
1354
1241
661
628
1702
1656
502
1356
7
1621
741
191
693
1148
1220
530
33
1577
1199
1659
1742
1575
714
617
465
1097
1850
1535
238
128
1066
984
685
750
642
1279
35
9992
629
1793
631
749
377
606
1039
1362
620
783
875
609
1749
403
406
1150
1526
1812
507
429
945
311
332
976
649
71
460
1669
859
963
621
533
1745
550
1342
1406
1203
1226
1311
30
1292
1067
93
27
1132
355
355
706
321
1335
1510
641
641
1789
1779
1339
269
1392
1149
625
1712
271
1340
1554
1185
677
1230
29
832
746
1587
1871
314
170
991
946
601
1673
969
163
51
928
1409
1409
1806
303
1444
611
1786
840
967
858
1434
1824
1063
1671
518
892
1637
670
1829
471
767
74
921
1350
302
537
1
1688
328
887
942
1268
57
1836
1419
4
901
3
1625
1413
1538
784
498
1370
73
1221
1428
153
826
162
1331
46
315
612
339
341
635
637
1300
1316
994
1355
1338
1341
1412
1670
1680
1711
1790
1555
1005
1073
126
1500
449
816
1345
1677
1480
823
433
76
1440
1363
318
1496
769
411
1858
1264
1258
1072
1596
18
727
1774
64
1856
137
125
451
1375
1561
1450
1069
113
139
235
135
208
856
441
811
520
1487
1628
1154
118
1855
297
428
421
949
65
1489
1224
578
1080
393
762
1130
1591
490
456
86
1251
1099
1544
158
47
267
1667
1755
1172
1037
342
173
1483
131
299
104
1262
150
819
722
184
1504
1378
1184
425
275
1336
787
605
822
654
468
1831
457
1260
1696
1359
1243
174
255
839
1234
1297
1713
878
1565
1694
1485
953
726
539
590
1245
431
917
1263
1493
430
467
1626
1578
718
596
1433
249
446
940
598
956
1003
1854
1507
348
1259
808
828
1817
1706
400
1102
1870
105
514
1246
1853
1030
861
742
1651
1228
1473
546
124
666
955
1481
215
788
325
284
1512
1603
638
1200
327
356
959
17
710
853
965
1662
1662
167
668
439
143
694
279
869
860
842
1453
1572
1248
1737
1499
1718
717
870
127
1032
1604
1250
1422
420
1580
109
1505
1021
1006
721
455
1389
1471
1832
1763
112
795
1503
283
452
911
715
1271
1026
1845
588
1653
1374
110
1632
19
958
1044
1666
1077
261
1501
138
1569
187
1654
1171
120
1376
1574
475
1267
1655
145
1218
1379
1261
1657
1552
821
603
810
1183
1463
408
1233
915
1602
1296
1734
733
855
186
1377
484
740
344
401
897
751
48
1399
1035
1470
1009
454
1445
419
689
525
346
595
766
1494
565
114
1571
1546
720
1232
1743
1491
495
1059
1661
1397
1579
1057
326
886
954
696
776
478
1828
812
1346
123
432
399
657
716
1071
1560
116
1047
1360
775
277
418
1276
1827
1166
382
857
134
1249
1249
1736
1762
447
753
863
1644
295
103
192
701
272
178
427
659
1079
1497
1502
254
1605
960
777
15
492
1344
115
143
604
1474
1475 14
7614
7714
8214
84
726
956
1032
1454
1486
1488
1490
1492
1495
1498
LR6
614
1242
1244
1025
1478
1298
1306
1308
1462
1458
996
999
1001
1081
1456
1367
1206
898
1056
1435
658
980
1343
97097
1
496
570
1186
1586
1668
1830
1527
1689
1684
1764
1534
1386
1872
172915
6615
68
1312
1649
1320
1815
1107
1176
72 209
42928
4
957
1184
1502
1757
1758
1748
1823
1590
1592
516
251
319
774
707
567
Unk
now
n
Con
serv
ed h
ypot
hetic
al
DN
A M
etab
olis
m
Pur
ines
, pyr
imid
ines
, nuc
leos
ides
and
nuc
leot
ides
Fat
ty a
cid/
Pho
spho
lipid
met
abol
ism
Ene
rgy
met
abol
ism
Reg
ulat
ory
func
tions
Tra
nspo
rt/b
indi
ng p
rote
ins
Tra
nsla
tion
Tra
nscr
iptio
n
Oth
er c
ateg
orie
s
Am
ino
acid
bio
synt
hesi
s
Bio
synt
hesi
s of
cof
acto
rs, p
rost
hetic
gro
ups,
car
riers
Cel
l env
elop
e
Cel
lula
r pr
oces
ses
Cen
tral
inte
rmed
iary
met
abol
ism
DA
KO
TA
JAM
AIC
A
Bar
b
KIN
GST
ON
278
?
61
6
98
59
104
52
44
32
152
aa
25
160
110
174
54 7
19
16
3613
16
64
ant?5
98
74
118
198
?5
19
8375
6 6178
192
bcaabcaabcaa7
bcaa8
11 80
61
7
153
13200
179
6 38
129
189
37
5
27
187
7
78
64
34
102
39
79
?
130
93
179
64
6
67
6
6 g3Pi42
93
46
16
57
8
12
11
11
158
27
165
8
5
140
9
64
102
45
46
77
119
35
40
164
16
67
45
148
6
34
11
34
108
57
46
143
33
119
14
46
46
16
5?
187
185
64
22
71 kb
Authentic Frame Shift
Paralogous gene family
GES region
Repeat Region
Archaeal Islands
Transporter
LP Lipoprotein
Signal peptide
94
9
Fe+++
5s rRNA
16s rRNA
23s rRNA
4
5S
23S
3
6
103
178
LR7
164
1
2135
154 10
1
97 6
53
20
RPT5ALR1
100 101
1320
1
4 2
163
130
168
142
123
aaX aaX
170
83
aaX205
20
COH?
67
5?
rib
7642 42
6 7 4242
COH
67
4266 42
42
642
56
?
42
42
42
444444 5COH
193
141
48
160 25
67
11
27
108
206
17
5COH
27
117
191
145
16
189
34
5COH
41COH?
COH
144
83
67
46
24
46
41COH?
Zn
42
149
193
4
145199
63
208
15K+?
92
17
113
1
142
60COH
190
41COH?
COH6
88
186
1
111
31
17
31
s/p
46
72
45
46
1
41COH
66
5Pi
108
60
46
73
5s/p
162
71
68
71
22
164
180Pi?11
133
4
60rib
50 6116
36
61
25
74
16
132
4
45
106
23
67
COH9
150
27
1 3? 12
66
42COH
103
70
117
7?11
7?11
71
199
127
69
46 80 16
COH7 COH6
46
46
190
197
191
71
207
48
rib
164
99
30
60
3?10
114
24 50
51
rib
6
7
5Zn
164
63
67
806
6
COH
162
31
44
162
COH8
16
71
29
61
46
113
4
5rib
5
42COH
174
16
42COH9
206
COH
Pi
88
5
180Pi?9
COH
?
199
164
209
209
145
124
48
1
COH
184
6
1768
H+16
71
6
72
4
46
16
87?10
60
187
?
46
4649
s/p
169
200ura11
117
COH9
60
4
167
87
28
?11
9
43
67
42
4
74
12
43
61
61
60
5COH
50
41COH?
COH6
2016
50
4
177 9
108
41s/p
Zn8
60
209
49
127
23
71Pi
61
169
27
92
106
153
6
9
4
88
207
46
24
aaX
aaX
aaX aaX
136 136
aaX20
aaX aaX
544Fe3+?
20
20
5 20 6
70
7
20aaX
56620
140
5
44aaX aaX
5
42
20aaX
aaX
94
?
5aaX
42mal
192
1326
5aaX
5
7
42
aaX6
?
204
132
203
45
5aaX
5
141
109
aaX
67Fe2+7
64
76
166 125
20aaX 6
11
134 78
21aaX
16
21aaX?
5
42mal 8
10420aaX
?
6
21aaX?
174
66
21aaX
42
120
21aaX
48
5
14
64
91
6410202
204
82
aaX 6
Na+
122
109
202
21aaX
125
5
133
34
5aaX
205
159
20aaX
170
7
45
194
80
183
12
64
41
84
98
mal
55187
20
51
192
aaX 6
64
45
21aaX
21aaX
183
188
16 64
5
42
5?
90
194
16
20aaX
46
6
42
42
42
5aaX
16
107
5aaX
4
Na+
16
115
9
21aaX
69
6
16
157Fe3+?
29
42
85
42mal
42
20aaX 6
79
5aaX
76
5aaX
Fe3+ 9
42
1811
20
aaX6
42
11
42
112
116
6
6
16Fe2+
29
6
85
67
123
8
210
46
188
5Fe3+
154
208
20aaX 6
139
7
aaX6
65
49
8
5aaX
6
46
42mal 6
16
64
10
75
197
21aaX?
120
29
5
31
117
5
11
82
163
16
90
12
?
20aaX
aaX6
5
8
aaX 6
?
6
6
86
5aaX
46
102
5aaX
42
134
20aaX 6
58
1779
5aaX
13
34
64
21aaX
52
159
41mal
42
186
42
55
Fe3+?9
9
42
203
21
5
aaX
20
K+
aaX 6
71
5aaX
157Fe3+?
10
123
21aaX?
148148
6g3Pi
NH4+
64
16
210
710
89
25
12
glyc7
27
81
116
16
16
138 9
205
120
33 7
6
24
39
14
181
173
161
105
16
149
45
51
24
34
124
81
182
53
51
13
38
93
16
195
162
49
6
68
6
46
132
86
187
711
166
41g3Pi
16
34
158
14
146
158
16
34
37
115
126
1758
45
173
2 9
122
170
91
111
105
77
68
198
26
56
52
45
175 8
96
148
62
2
25
10
33 6
89
39
26
93
16
156
128
31
131
466
42
336
64
154
tRNA16
121
115
34
98
16S
155
16
152
30 7
27
7
29
46
131
5
94
1
101
107
64
13
9
16135
174
62 151
161
24
146
46
74
467
182
9
102
73
78
46
336
6
104
16
26
16
25
56
34
47
102
336 33
48
336
130
35
108
81
41
5?
48
110
40
111
25
44
129
124124
46
33ant?6
42
201
137
155
22
12634
162
11
144
166
67
2 10
52
8
109
65
109 6
147chro?
47
98
101
145
19
5aa
196
192
184
83
66
5?
67
5?
80+? 12
34
137
64
31
29
161
99
156
32
5?
16
16
46
176 8
64
181
143
46
102
81
100
25
2 9
112
84
185
6
161
10
8
1812196
111
bcaa
165
172
8
94
8
5? 6
128
1K+?
6
7 1211
5?
18 11 26
102
95
121
5?
9
COH?
5?
98
67
118
132
22
16
132
96
34
153
50
16
168
32
172
172
5?
166
42
22
11
171?
46?
166
42
5?
16
25
16
5?
126
6
Pi
83
97 8
6
4213810
42
5
59
95
195+
7
COH42
28
5?
144
7 139
45Mg2+?
185
8
44
42
?6
161
5
11
5?
50
141
74
25
46aa
151
73
141
15K+
83
+6
150
114
5?
155
147chro?
129
67
107
58
16
113
3613
16
171?
167
rnpB
TMrrnaA5STMrrnaA16S TMrrnaA23S
1
32
1
1
2112
111
1
1
15
1
1
12
LR4
LR3
LR8
RPT1A
RPT1B
RPT2B
RPT3A
RPT3B
RPT3C
RPT3D
RPT4A
RPT4B
RPT6A
RPT6B
RPT6C
RPT7A
RPT7B
RPT8A
RPT8B
LR2
68
1215
482
258
485
11
1239
474
282
704
1105
1709
201
308
1384
259
1441
1240
1618
1818
1318
20
1096
1732
1765
1619
903
702
872
1238
1632
119
600
317
1455
1088
1730
1269
1550
723
593
691645
1277
1564
771
1169
1328
1053
117
730 747
882
1139
820
10291007
412
1731
1846
1161
384
912893
274
1028
1472
245
538
1692
1792
310
248
1701 1719
695
549
1390
824
422
1714
1536 1584
643
155
288
379
1158
438
1638
1857
96
1777
236
1170
203
1131
785
683
1794
1834
1182
237222
579
1400
548
209
444
662
1686
228
1724
939
371
1124
25
1537 1557
1697
1650
1425
1281
364
916
22
1523
98
1633
580
483
232 262
724
194
1201
1772
1371
759
211
814
10761060
1658
1368
573
313
1216
43 55
1528
1369
1187
289
157
1725
1135
1612
563
1213
1038
1214
285
1703
298
1321
42
1017
558
233
700
395
1426
1327
831
1052
1781
459
799
1094
1178
77
1401
1674
111
1465
791728
1726
1825
1168
1683
1862
372
352
1446
1302
1237
1317
292
576
553
591
671
1624
1770
559
365
873
1162
1819
16161598
1851
257
902
833
1034
1177
1509
1787
1382
1466
234
1004
1756
1255
1796
807
515
752
1848
67
1257
673
1511
1874
367
792
1636
1404
1607
570
1717
729
904
349
1137
1014
1820
1610
224
88
922
1693
1468
39
1164
204
1759
1078
1629
1867
709
1100
435
1253
1589 1643
1760
66
386 402
1414
172
1089
1138
97
1192
1519
1256
1609
827
547
1396
41
1447
306
1708
147
281
1093
472
1859
1432
1231
226
156
803
803
227
1424
1631
1357
165
343
925
1319
291
359
461
983
1129
1868
1733
290
1391
802
1563
1181
1570
268
1452
556
1313
453
554
1861
1646
1741
513
582
129
534
836
1582
1449
924
462
1286 1310
1648
1715
1780
231
574
458
1451
1785
686
1878
765
1642
1840
496
389
608
247
142
466
1278
1442
1784
1165
322
398
555
705
1685
69
1087
21
1863
1212
1383
434
160
464
1084
1398
1247
1778
1513
205
1595
307
841
1002
825
1754
1042
1518
1295
1698
188
800
141
508
1324
36
1687
544
102
527
276
1663
38
763
1301
197 199
509
1744
175
926
552
470
920
1508
1068
1611
5
1771
1869
436
426
1417
287
1211
100
383
780
10
121
1062
1479
1349
336
250 264
1728
1608
108
183
592
511
416
13
1095
1136
1254
1394
1427
1195
198
225 266
417
1826
1558
1797
1847
793
345
703
12
557
624
1403
146
835
1782
801
353
660
968
1707
1054
1791
176
1193
1325
347
1283
16151613 1614
1294
988
1530
1144
195177
1118
1469
1167
164
809
273
1385
1721
796
594
1112
1112
1207
914
1110
1393
680
366
519
1664
1358
414
571
193
136
392
770
1205
865
239
11551114
1679
874
1816
1620
616
894
260
681
149
1437
81
756
350
1280
445
378
1018
864
1121
1542
1163
1640
221
493
524
804
1191
220
481
1142
1045
1768
1660
1576
997
1270
505
82
938
564
17351723
1036
1695 1710
990
240
663
561
1000
587
330
541
885
1012
122
1588
1459
1524
1133
1210
1117
772
1019
1464
1567
665
1593
526
335333
847
667
1140
866
1290
1141
1601
699
324
394
664
830
1753
1506
1040
896
369
1145
1752
1366
1083
486
388
848
70
1525
1049
1682
1788
293
879
1811
934
1282
106
409
675
305
1547
731
1540
632
1043
964
851
1522
1635
908
1322
1229
895
1418
1594
1860
1814
171
1751
655
263
744
270
1516
1265
1235
1761
1672
1104
652
380
551
1521
1326
1041
1303
986
207
1273
337
545
10751061
1217
1520
1876
888
397
1380
169
323
735
910
1051
1304
385
1699
1288
243
779
889
862
687
1173
286
217
1467
647
877
877
373
1678
849
1430
497
1852
223
698
909 927
852
982
1766
1125 1159
1773
144
442
569
1416
374
1127
919 930
480
1023
334
979
1179
216
844
890
11471126
316
719679
843
1738
1293
413358
584
790755
1024
1740
1532
1691
296
522
182
748
1776
1573
1421
1783
1720
61
618
1122
529
972
1639
1364
212
1799
196
521
1156
1188
1090
1627
1822
1551 1581
219
797
1016
1058
15451543
1606
1101
1113
1011 1015
1092
1443
1351
387
1070
1722
473
1119
94
1225
1823
1388
53
697
1585
1585
1877
424362
935
648
640
883
83
1190
1272
230
577
148
8
1287
1431
218
837
1157
370
1027
943
351
161
586
213
845
229
1515
540
962
110310911074
1533
758
891
1803
1048
132
1821
653
906
1120
850
506
166
130
312
656
1835
37
1098
1769
575
494487
813
376
876
381
1010
688
1553
252
674
868
1209
1046
1641
998
708
937
1798
265
936
1086
1332
907
1334
1106
678
410
952
363
978
651
829
404
1013
1116
159
1175
1436
423
214
734
1050
1865
1115
443
981
633
1559
1134
543
846
1410
1160
95
690
815
15411539
1227
407
244
1517
1314
1675
1128
932
488
1429
1180
87
1055
1123
437
151
712
1387
989987
1808
560
1690
692
881
1020
1531
966
200
1809
1031
1875
368
1022
1438
768
40
1807
320
1411
613
294
619
253
11111109
1174
1208
1289
1365
1448
140
56
1548
189
500
52
1665
1198
309
834
669
23
1549
190
639
1189
973
951
711
950
1204
1700
1223
1405
650
476
280
884
1395
806
181
1460
1064
1849
1739
542
682
974 1008
1617
805
489
854
24
992
504
782
391
448
89
1597
1065
993
1457
477
1152
739
1219
646
1330
1748
1630
773
60
1775
1562
1747
1810
1153
794
1353
396
202
838
44 45
185
1329
133
1315
1408
562
1750
1841
1361
9
1252
246
300
210
572
905
523
340
713
179
732
1600
361
84 91
1622
634623
818
1634
786
607
1291
1202
1805
1833
1767 1802
764
1151
354
599 610
405
933
532
1529
1415
985
26
1843
17041676
503
241
535
725
867
1716
1461
789
1681
78
947
360
28
1795
1556
1813
1348
101
941
331
440
566
630
1623
948
63
1407
1323
62
1381
581
900
479
152
743
1085
50
1299
390
49
301
1275
1873
1033
14
1402
256
1842
676
757
995
469
72
1236
737
1804
778
597
761
1274
918
79
1305
1196
85
450
977
1284
304
1800
1197
1801
1143
583
1844
672
1652
1837
1423
1333
975
463
338
168
512
798
636
80
1420
6
357
1864
528 531
589
880
499
75
1727
58
517
1705
736
242
1194
568
871
644
929
1838
154
626
2
90
1583
1866
1372
1222
899
34
510
13091266
1439
329
1146
54
501
760745 754
1599
491
913
536
961
931
31
1352
1746
1347
1645
206
923
627
1514
59
1082
1373
1307
585
375
781
615
180
1285
1839
622
944
817
684
1647
1337
415
602
738
1354
1241
661
628
1702
1656
502
1356
7
1621
741
191
693
1148
1220
530
33
1577
1199
1659
1742
1575
714
617
465
1097
1850
1535
238
128
1066
984
685
750
642
1279
35
9992
629
1793
631
749
377
606
1039
1362
620
783
875
609
1749
403 406
1150
1526
1812
507
429
945
311 332
976
649
71
460
1669
859
963
621
533
1745
550
1342 1406
1203 1226
1311
30
1292
1067
93
27
1132
355
355
706
321
1335
1510
641
641
17891779
1339
269
1392
1149
625
1712
271
1340
1554
1185
677
1230
29
832
746
1587
1871
314
170
991
946
601
1673
969
163
51
928
1409
1409
1806
303
1444
611
1786
840
967
858
1434
1824
1063
1671
518
892
1637
670
1829
471
767
74
921
1350
302
537
1
1688
328
887 942
1268
57
1836
1419
4
901
3
1625
1413
1538
784
498
1370
73
1221
1428
153
826
162
1331
46
315
612
339 341
635 637
1300 1316
994
13551338 1341
1412
1670 1680 1711
1790
1555
1005
1073
126
1500
449
816
1345
1677
1480
823
433
76
1440
1363
318
1496
769
411
1858
12641258
1072
1596
18
727
1774
64
1856
137125
451
1375
1561
1450
1069
113 139
235
135
208
856
441
811
520
1487
1628
1154
118
1855
297
428421
949
65
1489
1224
578
1080
393
762
1130
1591
490456
86
1251
1099
1544
158
47
267
1667
1755
1172
1037
342
173
1483
131
299
104
1262
150
819
722
184
1504
1378
1184
425
275
1336
787
605
822
654
468
1831
457
1260
1696
1359
1243
174
255
839
1234
1297
1713
878
1565
1694
1485
953
726
539
590
1245
431
917
1263
1493
430 467
1626
1578
718
596
1433
249
446
940
598
956
1003
1854
1507
348
1259
808 828
1817
1706
400
1102
1870
105
514
1246
1853
1030
861
742
1651
1228
1473
546
124
666
955
1481
215
788
325
284
1512
1603
638
1200
327
356
959
17
710
853
965
1662
1662
167
668
439
143
694
279
869860842
1453
1572
1248
1737
1499
1718
717
870
127
1032
1604
1250
1422
420
1580
109
1505
10211006
721
455
1389
1471
1832
1763
112
795
1503
283
452
911
715
1271
1026
1845
588
1653
1374
110
16 3219
958
1044
1666
1077
261
1501
138
1569
187
1654
1171
120
1376
1574
475
1267
1655
145
1218
1379
1261
1657
1552
821
603
810
1183
1463
408
1233
915
1602
1296
1734
733
855
186
1377
484
740
344
401
897
751
48
1399
1035
1470
1009
454
1445
419
689
525
346
595
766
1494
565
114
15711546
720
1232
1743
1491
495
1059
1661
1397
1579
1057
326
886 954
696
776
478
1828
812
1346
123
432
399
657 716
1071
1560
116
1047
1360
775
277
418
1276
1827
1166
382
857
134
1249
1249
1736
1762
447
753
863
1644
295
103
192
701
272
178
427
659
1079
1497 1502
254
1605
960
777
15
492
1344
115
143
604
14741475
14761477
1482 1484
726
956
1032
1454 1486 1488 1490 1492 1495 1498
LR6
614
1242 1244
1025
1478
1298 1306 1308
14621458
996 999 1001
1081
1456
1367
1206
898
1056
1435
658
980
1343
970971
496
570
1186
1586
1668
1830
1527
16891684
1764
1534
1386
1872
1729
1566 1568
1312
1649
1320
1815
1107
1176
72
209
429
284
957
1184
1502
175717581748
1823
1590 1592
516
251
319
774
707
567
Unknown
Conserved hypothetical
DNA Metabolism
Purines, pyrimidines, nucleosides and nucleotides
Fatty acid/Phospholipid metabolism
Energy metabolism
Regulatory functions
Transport/binding proteins
Translation
Transcription
Other categories
Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, carriers
Cell envelope
Cellular processes
Central intermediary metabolism
DAKOTA
JAMAICA
Barb
KINGSTON
278
?
61
6
98
59
104
52
44
32
152
aa
25
160
110
174
54 7
19
16
3613
16
64
ant?5
98
74
118
198
?5
19
8375
6 6178
192
bcaabcaabcaa7
bcaa8
11 80
61
7
153
13200
179
6 38
129
189
37
5
27
187
7
78
64
34
102
39
79
?
130
93
179
64
6
67
6
6 g3Pi42
93
46
16
57
8
12
11
11
158
27
165
8
5
140
9
64
102
45
46
77
119
35
40
164
16
67
45
148
6
34
11
34
108
57
46
143
33
119
14
46
46
16
5?
187
185
64
22
71 kb
Authentic Frame Shift
Paralogous gene family
GES region
Repeat Region
Archaeal Islands
Transporter
LP Lipoprotein
Signal peptide
94
9
Fe+++
5s rRNA
16s rRNA
23s rRNA
4
5S
23S
3
6
103
178
LR7
164
1
2135
154 10
1
97 6
53
20
RPT5ALR1
100 101
1320
1
4 2
163
130
168
142
123
aaX aaX
170
83
aaX205
20
COH?
67
5?
rib
7642 42
6 7 4242
COH
67
4266 42
42
642
56
?
42
42
42
444444 5COH
193
141
48
160 25
67
11
27
108
206
17
5COH
27
117
191
145
16
189
34
5COH
41COH?
COH
144
83
67
46
24
46
41COH?
Zn
42
149
193
4
145199
63
208
15K+?
92
17
113
1
142
60COH
190
41COH?
COH6
88
186
1
111
31
17
31
s/p
46
72
45
46
1
41COH
66
5Pi
108
60
46
73
5s/p
162
71
68
71
22
164
180Pi?11
133
4
60rib
50 6116
36
61
25
74
16
132
4
45
106
23
67
COH9
150
27
1 3? 12
66
42COH
103
70
117
7?11
7?11
71
199
127
69
46 80 16
COH7 COH6
46
46
190
197
191
71
207
48
rib
164
99
30
60
3?10
114
24 50
51
rib
6
7
5Zn
164
63
67
806
6
COH
162
31
44
162
COH8
16
71
29
61
46
113
4
5rib
5
42COH
174
16
42COH9
206
COH
Pi
88
5
180Pi?9
COH
?
199
164
209
209
145
124
48
1
COH
184
6
1768
H+16
71
6
72
4
46
16
87?10
60
187
?
46
4649
s/p
169
200ura11
117
COH9
60
4
167
87
28
?11
9
43
67
42
4
74
12
43
61
61
60
5COH
50
41COH?
COH6
2016
50
4
177 9
108
41s/p
Zn8
60
209
49
127
23
71Pi
61
169
27
92
106
153
6
9
4
88
207
46
24
aaX
aaX
aaX aaX
136 136
aaX20
aaX aaX
544Fe3+?
20
20
5 20 6
70
7
20aaX
56620
140
5
44aaX aaX
5
42
20aaX
aaX
94
?
5aaX
42mal
192
1326
5aaX
5
7
42
aaX6
?
204
132
203
45
5aaX
5
141
109
aaX
67Fe2+7
64
76
166 125
20aaX 6
11
134 78
21aaX
16
21aaX?
5
42mal 8
10420aaX
?
6
21aaX?
174
66
21aaX
42
120
21aaX
48
5
14
64
91
6410202
204
82
aaX 6
Na+
122
109
202
21aaX
125
5
133
34
5aaX
205
159
20aaX
170
7
45
194
80
183
12
64
41
84
98
mal
55187
20
51
192
aaX 6
64
45
21aaX
21aaX
183
188
16 64
5
42
5?
90
194
16
20aaX
46
6
42
42
42
5aaX
16
107
5aaX
4
Na+
16
115
9
21aaX
69
6
16
157Fe3+?
29
42
85
42mal
42
20aaX 6
79
5aaX
76
5aaX
Fe3+ 9
42
1811
20
aaX6
42
11
42
112
116
6
6
16Fe2+
29
6
85
67
123
8
210
46
188
5Fe3+
154
208
20aaX 6
139
7
aaX6
65
49
8
5aaX
6
46
42mal 6
16
64
10
75
197
21aaX?
120
29
5
31
117
5
11
82
163
16
90
12
?
20aaX
aaX6
5
8
aaX 6
?
6
6
86
5aaX
46
102
5aaX
42
134
20aaX 6
58
1779
5aaX
13
34
64
21aaX
52
159
41mal
42
186
42
55
Fe3+?9
9
42
203
21
5
aaX
20
K+
aaX 6
71
5aaX
157Fe3+?
10
123
21aaX?
148148
6g3Pi
NH4+
64
16
210
710
89
25
12
glyc7
27
81
116
16
16
138 9
205
120
33 7
6
24
39
14
181
173
161
105
16
149
45
51
24
34
124
81
182
53
51
13
38
93
16
195
162
49
6
68
6
46
132
86
187
711
166
41g3Pi
16
34
158
14
146
158
16
34
37
115
126
1758
45
173
2 9
122
170
91
111
105
77
68
198
26
56
52
45
175 8
96
148
62
2
25
10
33 6
89
39
26
93
16
156
128
31
131
466
42
336
64
154
tRNA16
121
115
34
98
16S
155
16
152
30 7
27
7
29
46
131
5
94
1
101
107
64
13
9
16135
174
62 151
161
24
146
46
74
467
182
9
102
73
78
46
336
6
104
16
26
16
25
56
34
47
102
336 33
48
336
130
35
108
81
41
5?
48
110
40
111
25
44
129
124124
46
33ant?6
42
201
137
155
22
12634
162
11
144
166
67
2 10
52
8
109
65
109 6
147chro?
47
98
101
145
19
5aa
196
192
184
83
66
5?
67
5?
80+? 12
34
137
64
31
29
161
99
156
32
5?
16
16
46
176 8
64
181
143
46
102
81
100
25
2 9
112
84
185
6
161
10
8
1812196
111
bcaa
165
172
8
94
8
5? 6
128
1K+?
6
7 1211
5?
18 11 26
102
95
121
5?
9
COH?
5?
98
67
118
132
22
16
132
96
34
153
50
16
168
32
172
172
5?
166
42
22
11
171?
46?
166
42
5?
16
25
16
5?
126
6
Pi
83
97 8
6
4213810
42
5
59
95
195+
7
COH42
28
5?
144
7 139
45Mg2+?
185
8
44
42
?6
161
5
11
5?
50
141
74
25
46aa
151
73
141
15K+
83
+6
150
114
5?
155
147chro?
129
67
107
58
16
113
3613
16
171?
167
rnpB
TMrrnaA5STMrrnaA16S TMrrnaA23S
1
32
1
1
2112
111
1
1
15
1
1
12
LR4
LR3
LR8
RPT1A
RPT1B
RPT2B
RPT3A
RPT3B
RPT3C
RPT3D
RPT4A
RPT4B
RPT6A
RPT6B
RPT6C
RPT7A
RPT7B
RPT8A
RPT8B
LR2
68
1215
482
258
485
11
1239
474
282
704
1105
1709
201
308
1384
259
1441
1240
1618
1818
1318
20
1096
1732
1765
1619
903
702
872
1238
1632
119
600
317
1455
1088
1730
1269
1550
723
593
691645
1277
1564
771
1169
1328
1053
117
730 747
882
1139
820
10291007
412
1731
1846
1161
384
912893
274
1028
1472
245
538
1692
1792
310
248
1701 1719
695
549
1390
824
422
1714
1536 1584
643
155
288
379
1158
438
1638
1857
96
1777
236
1170
203
1131
785
683
1794
1834
1182
237222
579
1400
548
209
444
662
1686
228
1724
939
371
1124
25
1537 1557
1697
1650
1425
1281
364
916
22
1523
98
1633
580
483
232 262
724
194
1201
1772
1371
759
211
814
10761060
1658
1368
573
313
1216
43 55
1528
1369
1187
289
157
1725
1135
1612
563
1213
1038
1214
285
1703
298
1321
42
1017
558
233
700
395
1426
1327
831
1052
1781
459
799
1094
1178
77
1401
1674
111
1465
791728
1726
1825
1168
1683
1862
372
352
1446
1302
1237
1317
292
576
553
591
671
1624
1770
559
365
873
1162
1819
16161598
1851
257
902
833
1034
1177
1509
1787
1382
1466
234
1004
1756
1255
1796
807
515
752
1848
67
1257
673
1511
1874
367
792
1636
1404
1607
570
1717
729
904
349
1137
1014
1820
1610
224
88
922
1693
1468
39
1164
204
1759
1078
1629
1867
709
1100
435
1253
1589 1643
1760
66
386 402
1414
172
1089
1138
97
1192
1519
1256
1609
827
547
1396
41
1447
306
1708
147
281
1093
472
1859
1432
1231
226
156
803
803
227
1424
1631
1357
165
343
925
1319
291
359
461
983
1129
1868
1733
290
1391
802
1563
1181
1570
268
1452
556
1313
453
554
1861
1646
1741
513
582
129
534
836
1582
1449
924
462
1286 1310
1648
1715
1780
231
574
458
1451
1785
686
1878
765
1642
1840
496
389
608
247
142
466
1278
1442
1784
1165
322
398
555
705
1685
69
1087
21
1863
1212
1383
434
160
464
1084
1398
1247
1778
1513
205
1595
307
841
1002
825
1754
1042
1518
1295
1698
188
800
141
508
1324
36
1687
544
102
527
276
1663
38
763
1301
197 199
509
1744
175
926
552
470
920
1508
1068
1611
5
1771
1869
436
426
1417
287
1211
100
383
780
10
121
1062
1479
1349
336
250 264
1728
1608
108
183
592
511
416
13
1095
1136
1254
1394
1427
1195
198
225 266
417
1826
1558
1797
1847
793
345
703
12
557
624
1403
146
835
1782
801
353
660
968
1707
1054
1791
176
1193
1325
347
1283
16151613 1614
1294
988
1530
1144
195177
1118
1469
1167
164
809
273
1385
1721
796
594
1112
1112
1207
914
1110
1393
680
366
519
1664
1358
414
571
193
136
392
770
1205
865
239
11551114
1679
874
1816
1620
616
894
260
681
149
1437
81
756
350
1280
445
378
1018
864
1121
1542
1163
1640
221
493
524
804
1191
220
481
1142
1045
1768
1660
1576
997
1270
505
82
938
564
17351723
1036
1695 1710
990
240
663
561
1000
587
330
541
885
1012
122
1588
1459
1524
1133
1210
1117
772
1019
1464
1567
665
1593
526
335333
847
667
1140
866
1290
1141
1601
699
324
394
664
830
1753
1506
1040
896
369
1145
1752
1366
1083
486
388
848
70
1525
1049
1682
1788
293
879
1811
934
1282
106
409
675
305
1547
731
1540
632
1043
964
851
1522
1635
908
1322
1229
895
1418
1594
1860
1814
171
1751
655
263
744
270
1516
1265
1235
1761
1672
1104
652
380
551
1521
1326
1041
1303
986
207
1273
337
545
10751061
1217
1520
1876
888
397
1380
169
323
735
910
1051
1304
385
1699
1288
243
779
889
862
687
1173
286
217
1467
647
877
877
373
1678
849
1430
497
1852
223
698
909 927
852
982
1766
1125 1159
1773
144
442
569
1416
374
1127
919 930
480
1023
334
979
1179
216
844
890
11471126
316
719679
843
1738
1293
413358
584
790755
1024
1740
1532
1691
296
522
182
748
1776
1573
1421
1783
1720
61
618
1122
529
972
1639
1364
212
1799
196
521
1156
1188
1090
1627
1822
1551 1581
219
797
1016
1058
15451543
1606
1101
1113
1011 1015
1092
1443
1351
387
1070
1722
473
1119
94
1225
1823
1388
53
697
1585
1585
1877
424362
935
648
640
883
83
1190
1272
230
577
148
8
1287
1431
218
837
1157
370
1027
943
351
161
586
213
845
229
1515
540
962
110310911074
1533
758
891
1803
1048
132
1821
653
906
1120
850
506
166
130
312
656
1835
37
1098
1769
575
494487
813
376
876
381
1010
688
1553
252
674
868
1209
1046
1641
998
708
937
1798
265
936
1086
1332
907
1334
1106
678
410
952
363
978
651
829
404
1013
1116
159
1175
1436
423
214
734
1050
1865
1115
443
981
633
1559
1134
543
846
1410
1160
95
690
815
15411539
1227
407
244
1517
1314
1675
1128
932
488
1429
1180
87
1055
1123
437
151
712
1387
989987
1808
560
1690
692
881
1020
1531
966
200
1809
1031
1875
368
1022
1438
768
40
1807
320
1411
613
294
619
253
11111109
1174
1208
1289
1365
1448
140
56
1548
189
500
52
1665
1198
309
834
669
23
1549
190
639
1189
973
951
711
950
1204
1700
1223
1405
650
476
280
884
1395
806
181
1460
1064
1849
1739
542
682
974 1008
1617
805
489
854
24
992
504
782
391
448
89
1597
1065
993
1457
477
1152
739
1219
646
1330
1748
1630
773
60
1775
1562
1747
1810
1153
794
1353
396
202
838
44 45
185
1329
133
1315
1408
562
1750
1841
1361
9
1252
246
300
210
572
905
523
340
713
179
732
1600
361
84 91
1622
634623
818
1634
786
607
1291
1202
1805
1833
1767 1802
764
1151
354
599 610
405
933
532
1529
1415
985
26
1843
17041676
503
241
535
725
867
1716
1461
789
1681
78
947
360
28
1795
1556
1813
1348
101
941
331
440
566
630
1623
948
63
1407
1323
62
1381
581
900
479
152
743
1085
50
1299
390
49
301
1275
1873
1033
14
1402
256
1842
676
757
995
469
72
1236
737
1804
778
597
761
1274
918
79
1305
1196
85
450
977
1284
304
1800
1197
1801
1143
583
1844
672
1652
1837
1423
1333
975
463
338
168
512
798
636
80
1420
6
357
1864
528 531
589
880
499
75
1727
58
517
1705
736
242
1194
568
871
644
929
1838
154
626
2
90
1583
1866
1372
1222
899
34
510
13091266
1439
329
1146
54
501
760745 754
1599
491
913
536
961
931
31
1352
1746
1347
1645
206
923
627
1514
59
1082
1373
1307
585
375
781
615
180
1285
1839
622
944
817
684
1647
1337
415
602
738
1354
1241
661
628
1702
1656
502
1356
7
1621
741
191
693
1148
1220
530
33
1577
1199
1659
1742
1575
714
617
465
1097
1850
1535
238
128
1066
984
685
750
642
1279
35
9992
629
1793
631
749
377
606
1039
1362
620
783
875
609
1749
403 406
1150
1526
1812
507
429
945
311 332
976
649
71
460
1669
859
963
621
533
1745
550
1342 1406
1203 1226
1311
30
1292
1067
93
27
1132
355
355
706
321
1335
1510
641
641
17891779
1339
269
1392
1149
625
1712
271
1340
1554
1185
677
1230
29
832
746
1587
1871
314
170
991
946
601
1673
969
163
51
928
1409
1409
1806
303
1444
611
1786
840
967
858
1434
1824
1063
1671
518
892
1637
670
1829
471
767
74
921
1350
302
537
1
1688
328
887 942
1268
57
1836
1419
4
901
3
1625
1413
1538
784
498
1370
73
1221
1428
153
826
162
1331
46
315
612
339 341
635 637
1300 1316
994
13551338 1341
1412
1670 1680 1711
1790
1555
1005
1073
126
1500
449
816
1345
1677
1480
823
433
76
1440
1363
318
1496
769
411
1858
12641258
1072
1596
18
727
1774
64
1856
137125
451
1375
1561
1450
1069
113 139
235
135
208
856
441
811
520
1487
1628
1154
118
1855
297
428421
949
65
1489
1224
578
1080
393
762
1130
1591
490456
86
1251
1099
1544
158
47
267
1667
1755
1172
1037
342
173
1483
131
299
104
1262
150
819
722
184
1504
1378
1184
425
275
1336
787
605
822
654
468
1831
457
1260
1696
1359
1243
174
255
839
1234
1297
1713
878
1565
1694
1485
953
726
539
590
1245
431
917
1263
1493
430 467
1626
1578
718
596
1433
249
446
940
598
956
1003
1854
1507
348
1259
808 828
1817
1706
400
1102
1870
105
514
1246
1853
1030
861
742
1651
1228
1473
546
124
666
955
1481
215
788
325
284
1512
1603
638
1200
327
356
959
17
710
853
965
1662
1662
167
668
439
143
694
279
869860842
1453
1572
1248
1737
1499
1718
717
870
127
1032
1604
1250
1422
420
1580
109
1505
10211006
721
455
1389
1471
1832
1763
112
795
1503
283
452
911
715
1271
1026
1845
588
1653
1374
110
16 3219
958
1044
1666
1077
261
1501
138
1569
187
1654
1171
120
1376
1574
475
1267
1655
145
1218
1379
1261
1657
1552
821
603
810
1183
1463
408
1233
915
1602
1296
1734
733
855
186
1377
484
740
344
401
897
751
48
1399
1035
1470
1009
454
1445
419
689
525
346
595
766
1494
565
114
15711546
720
1232
1743
1491
495
1059
1661
1397
1579
1057
326
886 954
696
776
478
1828
812
1346
123
432
399
657 716
1071
1560
116
1047
1360
775
277
418
1276
1827
1166
382
857
134
1249
1249
1736
1762
447
753
863
1644
295
103
192
701
272
178
427
659
1079
1497 1502
254
1605
960
777
15
492
1344
115
143
604
14741475
14761477
1482 1484
726
956
1032
1454 1486 1488 1490 1492 1495 1498
LR6
614
1242 1244
1025
1478
1298 1306 1308
14621458
996 999 1001
1081
1456
1367
1206
898
1056
1435
658
980
1343
970971
496
570
1186
1586
1668
1830
1527
16891684
1764
1534
1386
1872
1729
1566 1568
1312
1649
1320
1815
1107
1176
72
209
429
284
957
1184
1502
175717581748
1823
1590 1592
516
251
319
774
707
567
Unknown
Conserved hypothetical
DNA Metabolism
Purines, pyrimidines, nucleosides and nucleotides
Fatty acid/Phospholipid metabolism
Energy metabolism
Regulatory functions
Transport/binding proteins
Translation
Transcription
Other categories
Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, carriers
Cell envelope
Cellular processes
Central intermediary metabolism
DAKOTA
JAMAICA
Barb
KINGSTON
278
?
61
6
98
59
104
52
44
32
152
aa
25
160
110
174
54 7
19
16
3613
16
64
ant?5
98
74
118
198
?5
19
8375
6 6178
192
bcaabcaabcaa7
bcaa8
11 80
61
7
153
13200
179
6 38
129
189
37
5
27
187
7
78
64
34
102
39
79
?
130
93
179
64
6
67
6
6 g3Pi42
93
46
16
57
8
12
11
11
158
27
165
8
5
140
9
64
102
45
46
77
119
35
40
164
16
67
45
148
6
34
11
34
108
57
46
143
33
119
14
46
46
16
5?
187
185
64
22
71 kb
Authentic Frame Shift
Paralogous gene family
GES region
Repeat Region
Archaeal Islands
Transporter
LP Lipoprotein
Signal peptide
94
9
Fe+++
5s rRNA
16s rRNA
23s rRNA
4
5S
23S
3
6
103
178
LR7
164
1
2135
154 10
1
97 6
53
20
RPT5ALR1
100 101
1320
1
4 2
163
130
168
142
123
aaX aaX
170
83
aaX205
20
COH?
67
5?
rib
7642 42
6 7 4242
COH
67
4266 42
42
642
56
?
42
42
42
444444 5COH
193
141
48
160 25
67
11
27
108
206
17
5COH
27
117
191
145
16
189
34
5COH
41COH?
COH
144
83
67
46
24
46
41COH?
Zn
42
149
193
4
145199
63
208
15K+?
92
17
113
1
142
60COH
190
41COH?
COH6
88
186
1
111
31
17
31
s/p
46
72
45
46
1
41COH
66
5Pi
108
60
46
73
5s/p
162
71
68
71
22
164
180Pi?11
133
4
60rib
50 6116
36
61
25
74
16
132
4
45
106
23
67
COH9
150
27
1 3? 12
66
42COH
103
70
117
7?11
7?11
71
199
127
69
46 80 16
COH7 COH6
46
46
190
197
191
71
207
48
rib
164
99
30
60
3?10
114
24 50
51
rib
6
7
5Zn
164
63
67
806
6
COH
162
31
44
162
COH8
16
71
29
61
46
113
4
5rib
5
42COH
174
16
42COH9
206
COH
Pi
88
5
180Pi?9
COH
?
199
164
209
209
145
124
48
1
COH
184
6
1768
H+16
71
6
72
4
46
16
87?10
60
187
?
46
4649
s/p
169
200ura11
117
COH9
60
4
167
87
28
?11
9
43
67
42
4
74
12
43
61
61
60
5COH
50
41COH?
COH6
2016
50
4
177 9
108
41s/p
Zn8
60
209
49
127
23
71Pi
61
169
27
92
106
153
6
9
4
88
207
46
24
aaX
aaX
aaX aaX
136 136
aaX20
aaX aaX
544Fe3+?
20
20
5 20 6
70
7
20aaX
56620
140
5
44aaX aaX
5
42
20aaX
aaX
94
?
5aaX
42mal
192
1326
5aaX
5
7
42
aaX6
?
204
132
203
45
5aaX
5
141
109
aaX
67Fe2+7
64
76
166 125
20aaX 6
11
134 78
21aaX
16
21aaX?
5
42mal 8
10420aaX
?
6
21aaX?
174
66
21aaX
42
120
21aaX
48
5
14
64
91
6410202
204
82
aaX 6
Na+
122
109
202
21aaX
125
5
133
34
5aaX
205
159
20aaX
170
7
45
194
80
183
12
64
41
84
98
mal
55187
20
51
192
aaX 6
64
45
21aaX
21aaX
183
188
16 64
5
42
5?
90
194
16
20aaX
46
6
42
42
42
5aaX
16
107
5aaX
4
Na+
16
115
9
21aaX
69
6
16
157Fe3+?
29
42
85
42mal
42
20aaX 6
79
5aaX
76
5aaX
Fe3+ 9
42
1811
20
aaX6
42
11
42
112
116
6
6
16Fe2+
29
6
85
67
123
8
210
46
188
5Fe3+
154
208
20aaX 6
139
7
aaX6
65
49
8
5aaX
6
46
42mal 6
16
64
10
75
197
21aaX?
120
29
5
31
117
5
11
82
163
16
90
12
?
20aaX
aaX6
5
8
aaX 6
?
6
6
86
5aaX
46
102
5aaX
42
134
20aaX 6
58
1779
5aaX
13
34
64
21aaX
52
159
41mal
42
186
42
55
Fe3+?9
9
42
203
21
5
aaX
20
K+
aaX 6
71
5aaX
157Fe3+?
10
123
21aaX?
148148
6g3Pi
NH4+
64
16
210
710
89
25
12
glyc7
27
81
116
16
16
138 9
205
120
33 7
6
24
39
14
181
173
161
105
16
149
45
51
24
34
124
81
182
53
51
13
38
93
16
195
162
49
6
68
6
46
132
86
187
711
166
41g3Pi
16
34
158
14
146
158
16
34
37
115
126
1758
45
173
2 9
122
170
91
111
105
77
68
198
26
56
52
45
175 8
96
148
62
2
25
10
33 6
89
39
26
93
16
156
128
31
131
466
42
336
64
154
tRNA16
121
115
34
98
16S
155
16
152
30 7
27
7
29
46
131
5
94
1
101
107
64
13
9
16135
174
62 151
161
24
146
46
74
467
182
9
102
73
78
46
336
6
104
16
26
16
25
56
34
47
102
336 33
48
336
130
35
108
81
41
5?
48
110
40
111
25
44
129
124124
46
33ant?6
42
201
137
155
22
12634
162
11
144
166
67
2 10
52
8
109
65
109 6
147chro?
47
98
101
145
19
5aa
196
192
184
83
66
5?
67
5?
80+? 12
34
137
64
31
29
161
99
156
32
5?
16
16
46
176 8
64
181
143
46
102
81
100
25
2 9
112
84
185
6
161
10
8
1812196
111
bcaa
165
172
8
94
8
5? 6
128
1K+?
6
7 1211
5?
18 11 26
102
95
121
5?
9
COH?
5?
98
67
118
132
22
16
132
96
34
153
50
16
168
32
172
172
5?
166
42
22
11
171?
46?
166
42
5?
16
25
16
5?
126
6
Pi
83
97 8
6
4213810
42
5
59
95
195+
7
COH42
28
5?
144
7 139
45Mg2+?
185
8
44
42
?6
161
5
11
5?
50
141
74
25
46aa
151
73
141
15K+
83
+6
150
114
5?
155
147chro?
129
67
107
58
16
113
3613
16
171?
167
rnpB
TMrrnaA5STMrrnaA16S TMrrnaA23S
1
32
1
1
2112
111
1
1
15
1
1
12
LR4
LR3
LR8
RPT1A
RPT1B
RPT2B
RPT3A
RPT3B
RPT3C
RPT3D
RPT4A
RPT4B
RPT6A
RPT6B
RPT6C
RPT7A
RPT7B
RPT8A
RPT8B
LR2
68
1215
482
258
485
11
1239
474
282
704
1105
1709
201
308
1384
259
1441
1240
1618
1818
1318
20
1096
1732
1765
1619
903
702
872
1238
1632
119
600
317
1455
1088
1730
1269
1550
723
593
691645
1277
1564
771
1169
1328
1053
117
730 747
882
1139
820
10291007
412
1731
1846
1161
384
912893
274
1028
1472
245
538
1692
1792
310
248
1701 1719
695
549
1390
824
422
1714
1536 1584
643
155
288
379
1158
438
1638
1857
96
1777
236
1170
203
1131
785
683
1794
1834
1182
237222
579
1400
548
209
444
662
1686
228
1724
939
371
1124
25
1537 1557
1697
1650
1425
1281
364
916
22
1523
98
1633
580
483
232 262
724
194
1201
1772
1371
759
211
814
10761060
1658
1368
573
313
1216
43 55
1528
1369
1187
289
157
1725
1135
1612
563
1213
1038
1214
285
1703
298
1321
42
1017
558
233
700
395
1426
1327
831
1052
1781
459
799
1094
1178
77
1401
1674
111
1465
791728
1726
1825
1168
1683
1862
372
352
1446
1302
1237
1317
292
576
553
591
671
1624
1770
559
365
873
1162
1819
16161598
1851
257
902
833
1034
1177
1509
1787
1382
1466
234
1004
1756
1255
1796
807
515
752
1848
67
1257
673
1511
1874
367
792
1636
1404
1607
570
1717
729
904
349
1137
1014
1820
1610
224
88
922
1693
1468
39
1164
204
1759
1078
1629
1867
709
1100
435
1253
1589 1643
1760
66
386 402
1414
172
1089
1138
97
1192
1519
1256
1609
827
547
1396
41
1447
306
1708
147
281
1093
472
1859
1432
1231
226
156
803
803
227
1424
1631
1357
165
343
925
1319
291
359
461
983
1129
1868
1733
290
1391
802
1563
1181
1570
268
1452
556
1313
453
554
1861
1646
1741
513
582
129
534
836
1582
1449
924
462
1286 1310
1648
1715
1780
231
574
458
1451
1785
686
1878
765
1642
1840
496
389
608
247
142
466
1278
1442
1784
1165
322
398
555
705
1685
69
1087
21
1863
1212
1383
434
160
464
1084
1398
1247
1778
1513
205
1595
307
841
1002
825
1754
1042
1518
1295
1698
188
800
141
508
1324
36
1687
544
102
527
276
1663
38
763
1301
197 199
509
1744
175
926
552
470
920
1508
1068
1611
5
1771
1869
436
426
1417
287
1211
100
383
780
10
121
1062
1479
1349
336
250 264
1728
1608
108
183
592
511
416
13
1095
1136
1254
1394
1427
1195
198
225 266
417
1826
1558
1797
1847
793
345
703
12
557
624
1403
146
835
1782
801
353
660
968
1707
1054
1791
176
1193
1325
347
1283
16151613 1614
1294
988
1530
1144
195177
1118
1469
1167
164
809
273
1385
1721
796
594
1112
1112
1207
914
1110
1393
680
366
519
1664
1358
414
571
193
136
392
770
1205
865
239
11551114
1679
874
1816
1620
616
894
260
681
149
1437
81
756
350
1280
445
378
1018
864
1121
1542
1163
1640
221
493
524
804
1191
220
481
1142
1045
1768
1660
1576
997
1270
505
82
938
564
17351723
1036
1695 1710
990
240
663
561
1000
587
330
541
885
1012
122
1588
1459
1524
1133
1210
1117
772
1019
1464
1567
665
1593
526
335333
847
667
1140
866
1290
1141
1601
699
324
394
664
830
1753
1506
1040
896
369
1145
1752
1366
1083
486
388
848
70
1525
1049
1682
1788
293
879
1811
934
1282
106
409
675
305
1547
731
1540
632
1043
964
851
1522
1635
908
1322
1229
895
1418
1594
1860
1814
171
1751
655
263
744
270
1516
1265
1235
1761
1672
1104
652
380
551
1521
1326
1041
1303
986
207
1273
337
545
10751061
1217
1520
1876
888
397
1380
169
323
735
910
1051
1304
385
1699
1288
243
779
889
862
687
1173
286
217
1467
647
877
877
373
1678
849
1430
497
1852
223
698
909 927
852
982
1766
1125 1159
1773
144
442
569
1416
374
1127
919 930
480
1023
334
979
1179
216
844
890
11471126
316
719679
843
1738
1293
413358
584
790755
1024
1740
1532
1691
296
522
182
748
1776
1573
1421
1783
1720
61
618
1122
529
972
1639
1364
212
1799
196
521
1156
1188
1090
1627
1822
1551 1581
219
797
1016
1058
15451543
1606
1101
1113
1011 1015
1092
1443
1351
387
1070
1722
473
1119
94
1225
1823
1388
53
697
1585
1585
1877
424362
935
648
640
883
83
1190
1272
230
577
148
8
1287
1431
218
837
1157
370
1027
943
351
161
586
213
845
229
1515
540
962
110310911074
1533
758
891
1803
1048
132
1821
653
906
1120
850
506
166
130
312
656
1835
37
1098
1769
575
494487
813
376
876
381
1010
688
1553
252
674
868
1209
1046
1641
998
708
937
1798
265
936
1086
1332
907
1334
1106
678
410
952
363
978
651
829
404
1013
1116
159
1175
1436
423
214
734
1050
1865
1115
443
981
633
1559
1134
543
846
1410
1160
95
690
815
15411539
1227
407
244
1517
1314
1675
1128
932
488
1429
1180
87
1055
1123
437
151
712
1387
989987
1808
560
1690
692
881
1020
1531
966
200
1809
1031
1875
368
1022
1438
768
40
1807
320
1411
613
294
619
253
11111109
1174
1208
1289
1365
1448
140
56
1548
189
500
52
1665
1198
309
834
669
23
1549
190
639
1189
973
951
711
950
1204
1700
1223
1405
650
476
280
884
1395
806
181
1460
1064
1849
1739
542
682
974 1008
1617
805
489
854
24
992
504
782
391
448
89
1597
1065
993
1457
477
1152
739
1219
646
1330
1748
1630
773
60
1775
1562
1747
1810
1153
794
1353
396
202
838
44 45
185
1329
133
1315
1408
562
1750
1841
1361
9
1252
246
300
210
572
905
523
340
713
179
732
1600
361
84 91
1622
634623
818
1634
786
607
1291
1202
1805
1833
1767 1802
764
1151
354
599 610
405
933
532
1529
1415
985
26
1843
17041676
503
241
535
725
867
1716
1461
789
1681
78
947
360
28
1795
1556
1813
1348
101
941
331
440
566
630
1623
948
63
1407
1323
62
1381
581
900
479
152
743
1085
50
1299
390
49
301
1275
1873
1033
14
1402
256
1842
676
757
995
469
72
1236
737
1804
778
597
761
1274
918
79
1305
1196
85
450
977
1284
304
1800
1197
1801
1143
583
1844
672
1652
1837
1423
1333
975
463
338
168
512
798
636
80
1420
6
357
1864
528 531
589
880
499
75
1727
58
517
1705
736
242
1194
568
871
644
929
1838
154
626
2
90
1583
1866
1372
1222
899
34
510
13091266
1439
329
1146
54
501
760745 754
1599
491
913
536
961
931
31
1352
1746
1347
1645
206
923
627
1514
59
1082
1373
1307
585
375
781
615
180
1285
1839
622
944
817
684
1647
1337
415
602
738
1354
1241
661
628
1702
1656
502
1356
7
1621
741
191
693
1148
1220
530
33
1577
1199
1659
1742
1575
714
617
465
1097
1850
1535
238
128
1066
984
685
750
642
1279
35
9992
629
1793
631
749
377
606
1039
1362
620
783
875
609
1749
403 406
1150
1526
1812
507
429
945
311 332
976
649
71
460
1669
859
963
621
533
1745
550
1342 1406
1203 1226
1311
30
1292
1067
93
27
1132
355
355
706
321
1335
1510
641
641
17891779
1339
269
1392
1149
625
1712
271
1340
1554
1185
677
1230
29
832
746
1587
1871
314
170
991
946
601
1673
969
163
51
928
1409
1409
1806
303
1444
611
1786
840
967
858
1434
1824
1063
1671
518
892
1637
670
1829
471
767
74
921
1350
302
537
1
1688
328
887 942
1268
57
1836
1419
4
901
3
1625
1413
1538
784
498
1370
73
1221
1428
153
826
162
1331
46
315
612
339 341
635 637
1300 1316
994
13551338 1341
1412
1670 1680 1711
1790
1555
1005
1073
126
1500
449
816
1345
1677
1480
823
433
76
1440
1363
318
1496
769
411
1858
12641258
1072
1596
18
727
1774
64
1856
137125
451
1375
1561
1450
1069
113 139
235
135
208
856
441
811
520
1487
1628
1154
118
1855
297
428421
949
65
1489
1224
578
1080
393
762
1130
1591
490456
86
1251
1099
1544
158
47
267
1667
1755
1172
1037
342
173
1483
131
299
104
1262
150
819
722
184
1504
1378
1184
425
275
1336
787
605
822
654
468
1831
457
1260
1696
1359
1243
174
255
839
1234
1297
1713
878
1565
1694
1485
953
726
539
590
1245
431
917
1263
1493
430 467
1626
1578
718
596
1433
249
446
940
598
956
1003
1854
1507
348
1259
808 828
1817
1706
400
1102
1870
105
514
1246
1853
1030
861
742
1651
1228
1473
546
124
666
955
1481
215
788
325
284
1512
1603
638
1200
327
356
959
17
710
853
965
1662
1662
167
668
439
143
694
279
869860842
1453
1572
1248
1737
1499
1718
717
870
127
1032
1604
1250
1422
420
1580
109
1505
10211006
721
455
1389
1471
1832
1763
112
795
1503
283
452
911
715
1271
1026
1845
588
1653
1374
110
16 3219
958
1044
1666
1077
261
1501
138
1569
187
1654
1171
120
1376
1574
475
1267
1655
145
1218
1379
1261
1657
1552
821
603
810
1183
1463
408
1233
915
1602
1296
1734
733
855
186
1377
484
740
344
401
897
751
48
1399
1035
1470
1009
454
1445
419
689
525
346
595
766
1494
565
114
15711546
720
1232
1743
1491
495
1059
1661
1397
1579
1057
326
886 954
696
776
478
1828
812
1346
123
432
399
657 716
1071
1560
116
1047
1360
775
277
418
1276
1827
1166
382
857
134
1249
1249
1736
1762
447
753
863
1644
295
103
192
701
272
178
427
659
1079
1497 1502
254
1605
960
777
15
492
1344
115
143
604
14741475
14761477
1482 1484
726
956
1032
1454 1486 1488 1490 1492 1495 1498
LR6
614
1242 1244
1025
1478
1298 1306 1308
14621458
996 999 1001
1081
1456
1367
1206
898
1056
1435
658
980
1343
970971
496
570
1186
1586
1668
1830
1527
16891684
1764
1534
1386
1872
1729
1566 1568
1312
1649
1320
1815
1107
1176
72
209
429
284
957
1184
1502
175717581748
1823
1590 1592
516
251
319
774
707
567
Unknown
Conserved hypothetical
DNA Metabolism
Purines, pyrimidines, nucleosides and nucleotides
Fatty acid/Phospholipid metabolism
Energy metabolism
Regulatory functions
Transport/binding proteins
Translation
Transcription
Other categories
Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, carriers
Cell envelope
Cellular processes
Central intermediary metabolism
DAKOTA
JAMAICA
Barb
KINGSTON
278
?
61
6
98
59
104
52
44
32
152
aa
25
160
110
174
54 7
19
16
3613
16
64
ant?5
98
74
118
198
?5
19
8375
6 6178
192
bcaabcaabcaa7
bcaa8
11 80
61
7
153
13200
179
6 38
129
189
37
5
27
187
7
78
64
34
102
39
79
?
130
93
179
64
6
67
6
6 g3Pi42
93
46
16
57
8
12
11
11
158
27
165
8
5
140
9
64
102
45
46
77
119
35
40
164
16
67
45
148
6
34
11
34
108
57
46
143
33
119
14
46
46
16
5?
187
185
64
22
71 kb
Authentic Frame Shift
Paralogous gene family
GES region
Repeat Region
Archaeal Islands
Transporter
LP Lipoprotein
Signal peptide
94
9
Fe+++
5s rRNA
16s rRNA
23s rRNA
4
5S
23S
3
6
103
178
LR7
164
1
2135
154 10
1
97 6
53
20
RPT5ALR1
100 101
1320
1
4 2
163
130
168
142
123
aaX aaX
170
83
aaX205
20
COH?
67
5?
rib
7642 42
6 7 4242
COH
67
4266 42
42
642
56
?
42
42
42
444444 5COH
193
141
48
160 25
67
11
27
108
206
17
5COH
27
117
191
145
16
189
34
5COH
41COH?
COH
144
83
67
46
24
46
41COH?
Zn
42
149
193
4
145199
63
208
15K+?
92
17
113
1
142
60COH
190
41COH?
COH6
88
186
1
111
31
17
31
s/p
46
72
45
46
1
41COH
66
5Pi
108
60
46
73
5s/p
162
71
68
71
22
164
180Pi?11
133
4
60rib
50 6116
36
61
25
74
16
132
4
45
106
23
67
COH9
150
27
1 3? 12
66
42COH
103
70
117
7?11
7?11
71
199
127
69
46 80 16
COH7 COH6
46
46
190
197
191
71
207
48
rib
164
99
30
60
3?10
114
24 50
51
rib
6
7
5Zn
164
63
67
806
6
COH
162
31
44
162
COH8
16
71
29
61
46
113
4
5rib
5
42COH
174
16
42COH9
206
COH
Pi
88
5
180Pi?9
COH
?
199
164
209
209
145
124
48
1
COH
184
6
1768
H+16
71
6
72
4
46
16
87?10
60
187
?
46
4649
s/p
169
200ura11
117
COH9
60
4
167
87
28
?11
9
43
67
42
4
74
12
43
61
61
60
5COH
50
41COH?
COH6
2016
50
4
177 9
108
41s/p
Zn8
60
209
49
127
23
71Pi
61
169
27
92
106
153
6
9
4
88
207
46
24
aaX
aaX
aaX aaX
136 136
aaX20
aaX aaX
544Fe3+?
20
20
5 20 6
70
7
20aaX
56620
140
5
44aaX aaX
5
42
20aaX
aaX
94
?
5aaX
42mal
192
1326
5aaX
5
7
42
aaX6
?
204
132
203
45
5aaX
5
141
109
aaX
67Fe2+7
64
76
166 125
20aaX 6
11
134 78
21aaX
16
21aaX?
5
42mal 8
10420aaX
?
6
21aaX?
174
66
21aaX
42
120
21aaX
48
5
14
64
91
6410202
204
82
aaX 6
Na+
122
109
202
21aaX
125
5
133
34
5aaX
205
159
20aaX
170
7
45
194
80
183
12
64
41
84
98
mal
55187
20
51
192
aaX 6
64
45
21aaX
21aaX
183
188
16 64
5
42
5?
90
194
16
20aaX
46
6
42
42
42
5aaX
16
107
5aaX
4
Na+
16
115
9
21aaX
69
6
16
157Fe3+?
29
42
85
42mal
42
20aaX 6
79
5aaX
76
5aaX
Fe3+ 9
42
1811
20
aaX6
42
11
42
112
116
6
6
16Fe2+
29
6
85
67
123
8
210
46
188
5Fe3+
154
208
20aaX 6
139
7
aaX6
65
49
8
5aaX
6
46
42mal 6
16
64
10
75
197
21aaX?
120
29
5
31
117
5
11
82
163
16
90
12
?
20aaX
aaX6
5
8
aaX 6
?
6
6
86
5aaX
46
102
5aaX
42
134
20aaX 6
58
1779
5aaX
13
34
64
21aaX
52
159
41mal
42
186
42
55
Fe3+?9
9
42
203
21
5
aaX
20
K+
aaX 6
71
5aaX
157Fe3+?
10
123
21aaX?
148148
6g3Pi
NH4+
64
16
210
710
89
25
12
glyc7
27
81
116
16
16
138 9
205
120
33 7
6
24
39
14
181
173
161
105
16
149
45
51
24
34
124
81
182
53
51
13
38
93
16
195
162
49
6
68
6
46
132
86
187
711
166
41g3Pi
16
34
158
14
146
158
16
34
37
115
126
1758
45
173
2 9
122
170
91
111
105
77
68
198
26
56
52
45
175 8
96
148
62
2
25
10
33 6
89
39
26
93
16
156
128
31
131
466
42
336
64
154
tRNA16
121
115
34
98
16S
155
16
152
30 7
27
7
29
46
131
5
94
1
101
107
64
13
9
16135
174
62 151
161
24
146
46
74
467
182
9
102
73
78
46
336
6
104
16
26
16
25
56
34
47
102
336 33
48
336
130
35
108
81
41
5?
48
110
40
111
25
44
129
124124
46
33ant?6
42
201
137
155
22
12634
162
11
144
166
67
2 10
52
8
109
65
109 6
147chro?
47
98
101
145
19
5aa
196
192
184
83
66
5?
67
5?
80+? 12
34
137
64
31
29
161
99
156
32
5?
16
16
46
176 8
64
181
143
46
102
81
100
25
2 9
112
84
185
6
161
10
8
1812196
111
bcaa
165
172
8
94
8
5? 6
128
1K+?
6
7 1211
5?
18 11 26
102
95
121
5?
9
COH?
5?
98
67
118
132
22
16
132
96
34
153
50
16
168
32
172
172
5?
166
42
22
11
171?
46?
166
42
5?
16
25
16
5?
126
6
Pi
83
97 8
6
4213810
42
5
59
95
195+
7
COH42
28
5?
144
7 139
45Mg2+?
185
8
44
42
?6
161
5
11
5?
50
141
74
25
46aa
151
73
141
15K+
83
+6
150
114
5?
155
147chro?
129
67
107
58
16
113
3613
16
171?
167
rnpB
TMrrnaA5STMrrnaA16S TMrrnaA23S
1
32
1
1
2112
111
1
1
15
1
1
12
LR4
LR3
LR8
RPT1A
RPT1B
RPT2B
RPT3A
RPT3B
RPT3C
RPT3D
RPT4A
RPT4B
RPT6A
RPT6B
RPT6C
RPT7A
RPT7B
RPT8A
RPT8B
LR2
68
1215
482
258
485
11
1239
474
282
704
1105
1709
201
308
1384
259
1441
1240
1618
1818
1318
20
1096
1732
1765
1619
903
702
872
1238
1632
119
600
317
1455
1088
1730
1269
1550
723
593
691645
1277
1564
771
1169
1328
1053
117
730 747
882
1139
820
10291007
412
1731
1846
1161
384
912893
274
1028
1472
245
538
1692
1792
310
248
1701 1719
695
549
1390
824
422
1714
1536 1584
643
155
288
379
1158
438
1638
1857
96
1777
236
1170
203
1131
785
683
1794
1834
1182
237222
579
1400
548
209
444
662
1686
228
1724
939
371
1124
25
1537 1557
1697
1650
1425
1281
364
916
22
1523
98
1633
580
483
232 262
724
194
1201
1772
1371
759
211
814
10761060
1658
1368
573
313
1216
43 55
1528
1369
1187
289
157
1725
1135
1612
563
1213
1038
1214
285
1703
298
1321
42
1017
558
233
700
395
1426
1327
831
1052
1781
459
799
1094
1178
77
1401
1674
111
1465
791728
1726
1825
1168
1683
1862
372
352
1446
1302
1237
1317
292
576
553
591
671
1624
1770
559
365
873
1162
1819
16161598
1851
257
902
833
1034
1177
1509
1787
1382
1466
234
1004
1756
1255
1796
807
515
752
1848
67
1257
673
1511
1874
367
792
1636
1404
1607
570
1717
729
904
349
1137
1014
1820
1610
224
88
922
1693
1468
39
1164
204
1759
1078
1629
1867
709
1100
435
1253
1589 1643
1760
66
386 402
1414
172
1089
1138
97
1192
1519
1256
1609
827
547
1396
41
1447
306
1708
147
281
1093
472
1859
1432
1231
226
156
803
803
227
1424
1631
1357
165
343
925
1319
291
359
461
983
1129
1868
1733
290
1391
802
1563
1181
1570
268
1452
556
1313
453
554
1861
1646
1741
513
582
129
534
836
1582
1449
924
462
1286 1310
1648
1715
1780
231
574
458
1451
1785
686
1878
765
1642
1840
496
389
608
247
142
466
1278
1442
1784
1165
322
398
555
705
1685
69
1087
21
1863
1212
1383
434
160
464
1084
1398
1247
1778
1513
205
1595
307
841
1002
825
1754
1042
1518
1295
1698
188
800
141
508
1324
36
1687
544
102
527
276
1663
38
763
1301
197 199
509
1744
175
926
552
470
920
1508
1068
1611
5
1771
1869
436
426
1417
287
1211
100
383
780
10
121
1062
1479
1349
336
250 264
1728
1608
108
183
592
511
416
13
1095
1136
1254
1394
1427
1195
198
225 266
417
1826
1558
1797
1847
793
345
703
12
557
624
1403
146
835
1782
801
353
660
968
1707
1054
1791
176
1193
1325
347
1283
16151613 1614
1294
988
1530
1144
195177
1118
1469
1167
164
809
273
1385
1721
796
594
1112
1112
1207
914
1110
1393
680
366
519
1664
1358
414
571
193
136
392
770
1205
865
239
11551114
1679
874
1816
1620
616
894
260
681
149
1437
81
756
350
1280
445
378
1018
864
1121
1542
1163
1640
221
493
524
804
1191
220
481
1142
1045
1768
1660
1576
997
1270
505
82
938
564
17351723
1036
1695 1710
990
240
663
561
1000
587
330
541
885
1012
122
1588
1459
1524
1133
1210
1117
772
1019
1464
1567
665
1593
526
335333
847
667
1140
866
1290
1141
1601
699
324
394
664
830
1753
1506
1040
896
369
1145
1752
1366
1083
486
388
848
70
1525
1049
1682
1788
293
879
1811
934
1282
106
409
675
305
1547
731
1540
632
1043
964
851
1522
1635
908
1322
1229
895
1418
1594
1860
1814
171
1751
655
263
744
270
1516
1265
1235
1761
1672
1104
652
380
551
1521
1326
1041
1303
986
207
1273
337
545
10751061
1217
1520
1876
888
397
1380
169
323
735
910
1051
1304
385
1699
1288
243
779
889
862
687
1173
286
217
1467
647
877
877
373
1678
849
1430
497
1852
223
698
909 927
852
982
1766
1125 1159
1773
144
442
569
1416
374
1127
919 930
480
1023
334
979
1179
216
844
890
11471126
316
719679
843
1738
1293
413358
584
790755
1024
1740
1532
1691
296
522
182
748
1776
1573
1421
1783
1720
61
618
1122
529
972
1639
1364
212
1799
196
521
1156
1188
1090
1627
1822
1551 1581
219
797
1016
1058
15451543
1606
1101
1113
1011 1015
1092
1443
1351
387
1070
1722
473
1119
94
1225
1823
1388
53
697
1585
1585
1877
424362
935
648
640
883
83
1190
1272
230
577
148
8
1287
1431
218
837
1157
370
1027
943
351
161
586
213
845
229
1515
540
962
110310911074
1533
758
891
1803
1048
132
1821
653
906
1120
850
506
166
130
312
656
1835
37
1098
1769
575
494487
813
376
876
381
1010
688
1553
252
674
868
1209
1046
1641
998
708
937
1798
265
936
1086
1332
907
1334
1106
678
410
952
363
978
651
829
404
1013
1116
159
1175
1436
423
214
734
1050
1865
1115
443
981
633
1559
1134
543
846
1410
1160
95
690
815
15411539
1227
407
244
1517
1314
1675
1128
932
488
1429
1180
87
1055
1123
437
151
712
1387
989987
1808
560
1690
692
881
1020
1531
966
200
1809
1031
1875
368
1022
1438
768
40
1807
320
1411
613
294
619
253
11111109
1174
1208
1289
1365
1448
140
56
1548
189
500
52
1665
1198
309
834
669
23
1549
190
639
1189
973
951
711
950
1204
1700
1223
1405
650
476
280
884
1395
806
181
1460
1064
1849
1739
542
682
974 1008
1617
805
489
854
24
992
504
782
391
448
89
1597
1065
993
1457
477
1152
739
1219
646
1330
1748
1630
773
60
1775
1562
1747
1810
1153
794
1353
396
202
838
44 45
185
1329
133
1315
1408
562
1750
1841
1361
9
1252
246
300
210
572
905
523
340
713
179
732
1600
361
84 91
1622
634623
818
1634
786
607
1291
1202
1805
1833
1767 1802
764
1151
354
599 610
405
933
532
1529
1415
985
26
1843
17041676
503
241
535
725
867
1716
1461
789
1681
78
947
360
28
1795
1556
1813
1348
101
941
331
440
566
630
1623
948
63
1407
1323
62
1381
581
900
479
152
743
1085
50
1299
390
49
301
1275
1873
1033
14
1402
256
1842
676
757
995
469
72
1236
737
1804
778
597
761
1274
918
79
1305
1196
85
450
977
1284
304
1800
1197
1801
1143
583
1844
672
1652
1837
1423
1333
975
463
338
168
512
798
636
80
1420
6
357
1864
528 531
589
880
499
75
1727
58
517
1705
736
242
1194
568
871
644
929
1838
154
626
2
90
1583
1866
1372
1222
899
34
510
13091266
1439
329
1146
54
501
760745 754
1599
491
913
536
961
931
31
1352
1746
1347
1645
206
923
627
1514
59
1082
1373
1307
585
375
781
615
180
1285
1839
622
944
817
684
1647
1337
415
602
738
1354
1241
661
628
1702
1656
502
1356
7
1621
741
191
693
1148
1220
530
33
1577
1199
1659
1742
1575
714
617
465
1097
1850
1535
238
128
1066
984
685
750
642
1279
35
9992
629
1793
631
749
377
606
1039
1362
620
783
875
609
1749
403 406
1150
1526
1812
507
429
945
311 332
976
649
71
460
1669
859
963
621
533
1745
550
1342 1406
1203 1226
1311
30
1292
1067
93
27
1132
355
355
706
321
1335
1510
641
641
17891779
1339
269
1392
1149
625
1712
271
1340
1554
1185
677
1230
29
832
746
1587
1871
314
170
991
946
601
1673
969
163
51
928
1409
1409
1806
303
1444
611
1786
840
967
858
1434
1824
1063
1671
518
892
1637
670
1829
471
767
74
921
1350
302
537
1
1688
328
887 942
1268
57
1836
1419
4
901
3
1625
1413
1538
784
498
1370
73
1221
1428
153
826
162
1331
46
315
612
339 341
635 637
1300 1316
994
13551338 1341
1412
1670 1680 1711
1790
1555
1005
1073
126
1500
449
816
1345
1677
1480
823
433
76
1440
1363
318
1496
769
411
1858
12641258
1072
1596
18
727
1774
64
1856
137125
451
1375
1561
1450
1069
113 139
235
135
208
856
441
811
520
1487
1628
1154
118
1855
297
428421
949
65
1489
1224
578
1080
393
762
1130
1591
490456
86
1251
1099
1544
158
47
267
1667
1755
1172
1037
342
173
1483
131
299
104
1262
150
819
722
184
1504
1378
1184
425
275
1336
787
605
822
654
468
1831
457
1260
1696
1359
1243
174
255
839
1234
1297
1713
878
1565
1694
1485
953
726
539
590
1245
431
917
1263
1493
430 467
1626
1578
718
596
1433
249
446
940
598
956
1003
1854
1507
348
1259
808 828
1817
1706
400
1102
1870
105
514
1246
1853
1030
861
742
1651
1228
1473
546
124
666
955
1481
215
788
325
284
1512
1603
638
1200
327
356
959
17
710
853
965
1662
1662
167
668
439
143
694
279
869860842
1453
1572
1248
1737
1499
1718
717
870
127
1032
1604
1250
1422
420
1580
109
1505
10211006
721
455
1389
1471
1832
1763
112
795
1503
283
452
911
715
1271
1026
1845
588
1653
1374
110
16 3219
958
1044
1666
1077
261
1501
138
1569
187
1654
1171
120
1376
1574
475
1267
1655
145
1218
1379
1261
1657
1552
821
603
810
1183
1463
408
1233
915
1602
1296
1734
733
855
186
1377
484
740
344
401
897
751
48
1399
1035
1470
1009
454
1445
419
689
525
346
595
766
1494
565
114
15711546
720
1232
1743
1491
495
1059
1661
1397
1579
1057
326
886 954
696
776
478
1828
812
1346
123
432
399
657 716
1071
1560
116
1047
1360
775
277
418
1276
1827
1166
382
857
134
1249
1249
1736
1762
447
753
863
1644
295
103
192
701
272
178
427
659
1079
1497 1502
254
1605
960
777
15
492
1344
115
143
604
14741475
14761477
1482 1484
726
956
1032
1454 1486 1488 1490 1492 1495 1498
LR6
614
1242 1244
1025
1478
1298 1306 1308
14621458
996 999 1001
1081
1456
1367
1206
898
1056
1435
658
980
1343
970971
496
570
1186
1586
1668
1830
1527
16891684
1764
1534
1386
1872
1729
1566 1568
1312
1649
1320
1815
1107
1176
72
209
429
284
957
1184
1502
175717581748
1823
1590 1592
516
251
319
774
707
567
Unknown
Conserved hypothetical
DNA Metabolism
Purines, pyrimidines, nucleosides and nucleotides
Fatty acid/Phospholipid metabolism
Energy metabolism
Regulatory functions
Transport/binding proteins
Translation
Transcription
Other categories
Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, carriers
Cell envelope
Cellular processes
Central intermediary metabolism
DAKOTA
JAMAICA
Barb
KINGSTON
278
© 1999 Macmillan Magazines Ltd
© 1999 Macmillan Magazines Ltd
© 1999 Macmillan Magazines Ltd
© 1999 Macmillan Magazines Ltd