Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Supplementary materials and methods
Bacterial strains and culture conditions
We used 18 T. forsythia strains that were isolated from periodontitis patients at Tokyo
Medical and Dental University by Dr. Umeda during 1990–2010 (Supplementary Table S18).
We tried to use the culture media described previously (Takemoto et al., 1997), but they did
not work. Then, we modified that method by adding fetal bovine serum; this approach is
effective for T. denticola culture (Van and Smibert, 1983). The isolates were maintained
anaerobically (10% CO2, 10% H2, 80% N2 [Mitsubishi Gas Chemical, Tokyo, Japan]) in 6%
Gifu anaerobic medium (GAM, [Nissui, Tokyo, Japan]), supplemented with yeast extract (1
mg/mL [Nacalai Tesque, Kyoto, Japan]), hemin (5 µg/mL [Sigma-Aldrich, St. Louis,
U.S.A.]), menadione (0.5 µg/mL [Nacalai Tesque]), fetal bovine serum (35 µl/mL [HyClone,
Tauranga, N.Z.]) and 98% (w/v) N-acetylmuramic acid (NAM, 15 µg/mL [Sigma-Aldrich,
Deisenhofen, Germany]), or on GAM blood agar plates (6% GAM, 5% sheep blood [Nippon
Biotest Laboratories, Tokyo, Japan], 1.5% agarose [Nacalai Tesque], yeast extract [1 µg/mL],
hemin [5 µg/mL] and menadione [0.5 µg/mL]) with 98% (w/v) NAM disk (20 g/L, antibiotic
test filter paper thin, [Advantec, Tokyo, Japan]). After incubation for 7–14 days at 37ºC,
colonies were injected into 10 mL GAM liquid medium and cultured for 14 days at 37ºC.
Genomic DNA was extracted from each isolate using the methods described previously
(Watanabe et al., 2013).
Determination of T. forsythia complete genome sequences
Complete genome sequences of 2 T. forsythia strains (KS16 and 3313), which differed in
clinical parameters, were determined using a combination of GS Junior Titanium instrument
(454 Life sciences a Roche Company, CT, U.S.A.) for shotgun/paired-end sequencing and
Genome Analyzer IIx (GAIIx; Illumina, CA, U.S.A.) for 101-bp paired-end and traditional
Sanger sequencing (ABI 3730; Applied Biosystems, CA, U.S.A.). The Newbler v2.6 (454
Life sciences a Roche Company) and Trimmomatic v0.20 software packages were used for
5
10
15
20
25
trimming and filtering of the sequenced reads of GS Junior Titanium and GAIIx, respectively.
After the trimming/filtering, the total length of all the reads was as the followings: GS Junior
Titanium: KS16 47-Mbp and 3313 40-Mbp; GAIIx: KS16 999.8-Mbp and 3313 695.2-Mbp.
Then, these reads were assembled in Newbler v2.6 to obtain scaffolds. Sequence gaps
between the scaffolds were closed by PCR amplification and Sanger sequencing. SNPs and
homopolymers were corrected by mapping of Sanger and GAIIx reads, respectively.
In the complete genome sequences, protein-coding regions (CDSs) were predicted by the
Rapid Annotation Transfer Tool (RATT; Otto et al., 2011) using strain ATCC 43037 as a
reference. After that, MetaGeneAnnotator (Noguchi et al., 2006), Glimmer v3.02 (Salzberg et
al., 1998; Delcher et al., 1999; Delcher et al., 2007) and IMC-GE v3.0.30 (in silico Molecular
Cloning Genomics Edition; In Silico Biology, Yokohama, Japan) were used to improve
accuracy of CDS prediction. Then the predicted CDSs were annotated by BLASTP search
against the NCBI GenBank non-redundant (nr) database (last accessed Aug 27, 2012; Benson
et al., 2013) under the thresholds of query coverage (≥60%) and sequence identity (≥60%).
Ribosomal RNA (rRNA), transfer RNA (tRNA), and non-coding RNA were identified using
RNAmmer v1.2 (Lagesen et al., 2007), tRNAscan-SE v1.3.1 (Schattner et al., 2005), and
Rfam v11.0 software packages (Griffiths-Jones et al., 2005), respectively.
Insertion sequences (ISs) were identified using ISSaga v1.0 (Varani et al., 2011) and a
BLASTN search against the ISFinder database (last accessed Dec 20, 2001; Kichenaradja et
al., 2010) under the thresholds of sequence identity (≥95%) and e value (≤1e-5). Miniature
inverted-repeat transposable elements (MITEs) were identified using MUST v1.0 (Chen Y et
al., 2009) under the threshold of e value (≤1e-5) and the Markov Cluster Algorithm (MCL)
(van Dongen and Abreu-Goodger, 2012). Conjugative transposons (CTns) were identified
using Mauve v2.3.1 (Darling et al., 2004) and GenomeMatcher v1.75 (Ohtsubo et al., 2008),
if a region fulfilled the following two conditions: i) high nucleotide similarity to both CTns of
the reference strains (T. forsythia ATCC 43037, T. denticola ATCC 35405, and P. gingivalis
2
30
35
40
45
50
ATCC 33277) and ii) att sites at both ends.
Determination of T. forsythia draft genome sequences
Draft genome sequences of 16 T. forsythia strains (sit3, 15, 20, 2612, MH6, L7, 273, KM4,
3322, 291, 3114, TR1, 2444, 1224, 2442 and K8) were determined using GAIIx sequencing
(101-bp or 250-bp paired-end). Trimmomatic v0.20 and Velvet v1.2.03 (Zerbino and Birney,
2008) were used for trimming/filtering and assembly, respectively (Supplementary Table S2).
Then the draft genomes were annotated using Rapid Annotation using Subsystem Technology
server v4.0 (RAST; http://rast.nmpdr.org/; Aziz et al., 2008). Uncharacterized hypothetical
genes were characterized using BLASTP against the nr database (last accessed Aug 27, 2012;
Benson et al., 2013) under the threshold of e value (≤1e-5) to find any homologs. Within the
hypothetical genes without any homologs, we further searched for presence of any protein
motifs by BLASTP against NCBI’s Conserved Domain Database (CDD; last accessed Mar
21, 2013; Marchler-Bauer et al., 2011) under the thresholds of bit score (≥50) and e value
(≤0.001). Ribosomal RNA, transfer RNA, noncoding RNA, and the MGEs were identified as
described above. The fastq data have been deposited in the DDBJ/EMBL/GenBank databases.
Estimation of pan-genome size
In addition to the genomic data which we determined, we obtained the draft genome
sequence data of 8 T. denticola (ASLM, ATCC 33520, ATCC 33521, F0402, MYR-T, SP33,
SP37 and US-Trep) and 2 P. gingivalis (W50 and JCVI SC001) strains from the
DDBJ/EMBL/GenBank databases. All the sequence data were reannotated by using RAST to
obtain annotated data under the same prediction/annotation conditions. In this study, we call
the genes that were common for all the strains in each species or among the 3 species the
“core genes.” The core genes of each species or among all 3 species were further clustered
using the PGAP v1.11 under the thresholds of coverage (≥50), bit score (≥50), and e value
3
55
60
65
70
75
5
(≤1e-8) to identify the core genes. Regression analysis for new genes and the pan-genome was
conducted as described previously (Tettelin et al., 2008). The power laws are formulated as n
(new) = κ∙N-α and n (pan) = ε∙Nγ. The n (new) and n (pan) are the average of new gene and
pan-genome distributions, respectively. N is the number of genomes; κ, α, ε, and γ are free
parameters. Values of α less than 1.0 and γ greater than 0 indicate an open pan-genome
(Tettelin et al., 2008).
Multiple dot plot analysis
We used the 2 T. forsythia genomes described above for multiple dot plot analysis in which
all three species of the red complex were included. In addition, we obtained the complete
genome sequence of 1 T. forsythia (ATCC 43037), 6 T. denticola (ATCC 35404, ATCC 35405,
AL2, H1-T, H22 and OTK), and 3 P. gingivalis (ATCC 33277, TDC60 and W83) strains from
the DDBJ/EMBL/GenBank databases. These sequences were aligned using nucmer in the
MUMmer v3.23 (Delcher et al., 2003) and GenomeMatcher. In the dot plot, each dot
represents similarity of a 20-bp sequence, and rearrangement breakpoints were identified by
the ends of lines that were drawn by continuous plot of the dots. The rearrangement
breakpoints were characterized by the features located at both ends, as described previously
(Watanabe et al., 2013).
For each bacterial strain, the number of each MGE was divided by the number of CDS to
calculate the MGE/CDS ratio (IS/CDS, CTn/CDS, and MITE/CDS). In each MGE, the mean
and standard deviation were calculated from the ratios of all the strains for each species, and
were statistically tested for significant differences between the strains.
Construction of phylogenetic tree and SNP analysis
In each species in the red complex, the core genes that were single copy in all the strains
were tested to see whether rearrangement in a gene was significant, by using the Phi test
4
80
85
90
95
100
105
(Huson and Bryant, 2006) under the threshold p value (≤5.00e-02). An amino acid
substitution model was selected using ModelGenerator v0.85 (Keane et al., 2006). MEGA
v5.2 (Tamura et al., 2011) and RAxML v7.0.4 (Stamatakis, 2006) with the
PROTGAMMAIJTTF model were used to construct a neighbor-joining and minimum-
evolution tree, and a maximum likelihood tree, respectively, with 500 bootstrap replicates.
For all the gene clusters in each species where a gene was located in more than 2 genomes,
SNP loci were identified by comparing nucleotide sequences after alignment, and a dN/dS
ratio was calculated by PAML v4.6 (Yang, 1997). In each gene cluster, the dN/dS ratio was
calculated for every gene pair; the median value was calculated from all the ratios in the
cluster.
CRISPR analysis
CRISPR sequences and CRISPR-associated (cas) genes were identified by using
CRISPRFinder (last accessed Sep 18, 2013; Grissa et al., 2007). The cas gene candidates
were searched using BLASTP in the CDD database under the threshold of e value (≤0.001).
On the basis of amino acid sequence similarity and a protein motif of the cas genes, CRISPR
types and structures in T. forsythia and T. denticola were determined according to CRISPR
layouts in Makarova et al., (2011). In each CRISPR type, consensus sequences of all repeat
sequences were determined using WebLogo v3.3 (Crooks et al., 2004). In T. forsythia, the
repeat sequences were further analyzed using CRISPRmap v3.1.5 (Lange et al., 2013) to
characterize these repeats. The phage-like structures in the CRISPRs were identified using
both nucleotide sequence similarity with the known phage sequences in the NCBI nucleotide
database and gene annotations; PHAST (last accessed Jun 7, 2013; Zhou Y et al., 2011)
yielded no phage regions in any T. forsythia strains. For each CRISPR type, a non-redundant
unique spacer list was obtained by all-to-all BLASTN search, with the following criterion;
two spacers were considered to have high nucleotide similarity to each other if the bit score of
5
110
115
120
125
130
BLASTN was more than 50 (Watanabe et al., 2013; Pride et al., 2012). We used the BLASTN
search under following conditions: word size 7 and dust filter off. A heatmap and dendrogram
were drawn, as described previously (Watanabe et al., 2013).
To characterize a target of each spacer sequence, the spacer list was subjected to a BLASTN
search with a word size 7 and the dust filter “off” against 7 databases, as described previously
(Pride et al., 2012; Watanabe et al., 2013). Hits were considered significant when a bit score
was over 50. The subject sequences were annotated by BLASTX search in the NCBI
GenBank nr database (last accessed Aug 27, 2012; Benson et al., 2013) and were annotated
using the predicted protein of query sequences under the thresholds of the highest bit score
and e value (≤1e-03). For uncharacterized hypothetical genes, we further searched the
subjects for presence of any protein motifs as described above. Protospacer-adjacent motifs
(PAMs) were predicted from alignments of the 30-bp sequences at both ends of the subjects,
using WebLogo.
Metabolism analysis
In each species, mapping analysis of metabolic pathways was carried out; the core genes were
mapped on metabolic pathways using KAAS (http://www.genome.jp/kaas-bin/; Moriya et al.,
2007) according to the bidirectional best hit method and partial genome mode, and were
visualized using the iPath v2 (Yamada et al., 2011). The metabolic pathways included 5 main
pathway groups (carbohydrate, energy, lipid, nucleotide, and amino acid), which were further
subdivided into 29 small pathway groups. Each of the 29 pathway groups consisted of a
collection of stand-alone pathways that determined the changes of substances controlled by
bacterial genes. The complementation ratio (the proportion of the mapped genes in all the
genes required in each stand-alone pathway) was calculated for each species. The ratio was
also obtained under the condition where the core genes of either 2 or 3 species were analyzed.
The mean and standard deviation were calculated from all the ratios of each stand-alone
6
135
140
145
150
155
10
pathway, except for the case where all 3 species were included in the above mapping analysis.
The proportions of the complementation ratio were calculated between the conditions where
different numbers of the species were analyzed. Every small pathway group was analyzed to
find any stand-alone pathways, where the genes were mapped to encode all the genes required
in each pathway completely.
Virulence gene
To test for the presence of the known virulence genes in all the genomes of strains in each
species, BLASTP was used against the virulence genes described in the previous reports (Holt
et al., 1999; Sharma, 2010; Dashper et al., 2011), under the thresholds of query coverage
(≥80%) and sequence identity (≥80%). In addition, we tried to find any novel virulence gene
candidates using BLASTP against the virulence factor database (VFDB; last accessed Nov 25,
2013; Chen L et al., 2012) and the microbial database of protein toxins, virulence factors, and
antibiotic resistance genes for biodefense applications (MvirDB; last accessed Mar 21, 2007;
Zhou CE et al., 2007) under the threshold of e value (≤1e-7). The presence of both the known
virulence genes and the novel virulence gene candidates was visualized by drawing a
heatmap, using the R software package (http://cran.r-project.org/).
DATA ACCESS
The T. forsythia genome sequences have been deposited in the DDBJ/EMBL/GenBank
databases under the following accession numbers: complete genomes of AP013045 (KS16)
and AP013044 (3313), draft genomes of AP013045 (KS16), AP013044 (3313), DRS013040
(sit3), DRS013026 (15), DRS013027 (20), DRS013030 (2612), DRS013038 (MH6),
DRS013037 (L7), DRS013031 (273), DRS013036 (KM4), DRS013034 (3322), DRS013032
(291), DRS013033 (3114), DRS013039 (TR1), DRS013029 (2444), DRS013025 (1224),
DRS013028 (2442) and DRS013035 (K8), respectively.
7
160
165
170
175
180
Supplementary Results
Genomic characteristics of T. forsythia
Genomic information reflects evolutionary changes within a bacterial species; identification
of core genes that are conserved among all the strains and strain-specific genes that give
unique characteristics to each strain is important for the understanding of bacterial evolution.
Nonetheless, there was only one genome sequence of T. forsythia that is publicly available:
the laboratory strain ATCC 43037. In this study, for the purpose of characterization of T.
forsythia species based on genomic information, we determined genome sequences of 18 T.
forsythia strains, which had been isolated from different patients with several clinical types of
periodontitis (Supplementary Table S2 and S18). Two complete genome sequences
(Supplementary Fig. S15) and 16 draft genome sequences were included.
The 19 genomes, which included the laboratory strain ATCC 43037 and the 18 sequenced
strains, had 2,912 ± 86 (mean ± SD) protein-coding regions (CDSs; Supplementary Table S2).
The pan-genome, a sum of core and dispensable genomes, contained 5,418 CDSs, of which
1,733 CDSs (32.0%) were the core genes (genes in common among all the 19 strains;
Supplementary Fig. S16, Supplementary Table S4).
All the 19 strains lacked the genes for the synthesis of N-acetylmuramic acid (NAM); this
result was consistent with the fact that T. forsythia requires NAM for in vitro (laboratory)
bacterial growth (Braham and Moncla, 1992). Furthermore, we constructed phylogenetic trees
from the core genes (Supplementary Fig. S5). We found that when we analyzed the 2 clinical
types of periodontitis, T. forsythia strains of the same type were distributed in different
clusters. These data suggested that there were no obvious associations between a periodontitis
type and phylogenetic relationships. Strains TR1, 2444, 2442, and 1224 were clustered
together, but there were differences in the parameters between these strains; probing depth
(mm), clinical attachment level, and bleeding on probing differed among these strains. This
8
185
190
195
200
205
15
result is suggestive of applicability of CRISPR typing to T. forsythia species as a method for
clustering strains with differing clinical parameters.
As comparative analysis of the complete genome sequences, whole-genome dot plot analysis
among strains KS16, 3313, and ATCC 43037 showed that the genome organization was
highly conserved. Genomic rearrangements were observed only in the comparison of
KS16/3313 and KS16/ATCC 43037 (Supplementary Fig. S17), and one of the rearrangement
breakpoints contained an rRNA operon.
These results suggested that the genome structure was likely stable, and essential properties
such as bacterial growth were highly conserved within T. forsythia strains regardless of the
periodontitis type.
Supplementary Figure legends
Supplementary Figure S1 Comparison of gene contents of T. forsythia, T. denticola, and P.
gingivalis
Commonality and uniqueness of genes are shown among (A) 19 T. forsythia, (B) 6 T.
denticola, (C) 3 P. gingivalis strains, and (D) the red complex species.
Each circle represents the totality of all genes in each strain. The circles are overlapping to
show common genes among the strains. The central area represents the core genes of each
strain. The areas that consist of only one circle show the strain-specific genes in each strain,
with the numbers of the genes.
Supplementary Figure S2 An accumulation curve for the total number of genes, core genes,
and new genes as a function of the number of genomes analyzed for the red complex species
Accumulation curves for the pan-genome (squares) and the numbers of the core genes
(circles) and new genes (triangles), are drawn for each species: T. forsythia (gray), T.
denticola (blue), and P. gingivalis (red). The number of the plots is different among the 3
species because of the difference in the number of the genomes analyzed. The formula that
estimates the pan-genome size and the number of the new genes is shown near the curves.
9
210
215
220
225
230
235
Supplementary Figure S3 Comparative analysis of the rate of MGEs (CTn, IS, and MITE)
in the red complex species
The MGE/CDS ratio is shown as follows: (A) IS/CDS, (B) CTn/CDS, and (C) MITE/CDS.
Asterisks indicate statistical significance of differences between the ratio of 2 species: *P <
0.01 and **P < 0.05.
Supplementary Figure S4 Genetic layout of CRISPR/Cas loci of 19 strains of T. forsythia
The CRISPR/Cas arrays are shown as colored genes as in the margin note. (A) The arrays of
T. forsythia type I-B/rp30 and (B) T. forsythia type ll/rp47. The phage-like structures are
indicated by the boxes.
Supplementary Figure S5 Phylogenetic analysis based on the core genes of T. forsythia
Phylogenetic trees are shown with the bootstrap values at each node. (A) A maximum
likelihood tree, (B) a neighbor-joining tree, and (C) a minimum-evolution tree, respectively.
In the tree in panel A, clinical types of periodontitis are indicated on the right side of the strain
names: the aggressive type (“A” in blue) and the chronic type (“C” in red). Phylogenetic
distances are indicated by a scale bar beneath the trees.
Supplementary Figure S6 Genetic layout of the CRISPR/Cas locus of 6 strains of T.
denticola
The CRISPR/Cas locus (type II-A/rp27) of T. denticola is shown for 6 T. denticola strains, as
in Supplementary Figure S4.
Supplementary Figure S7 Polymorphism of repeat lengths and nucleotides in T. forsythia
and T. denticola
The arrays of the repeat lengths and nucleotides are indicated by colored boxes for each
isolate: (A) T. forsythia type I-B/rp30 (ATCC 43037, KS16, and 3313), (B) T. forsythia type
II/rp47 (KS16), (C) T. denticola type II-A/rp27 (AL2 and OTK), (D) T. denticola type
II-A/rp23 (ATCC 35404), and (E) T. denticola type II-A/rp36 (ATCC 35405). Polymorphism
of repeat lengths is indicated by the lengths of the colored boxes, with the blank boxes
showing the absence of repeat sequences. Nucleotide polymorphisms are indicated by the
10
240
245
250
255
260
colors as follows: no polymorphism (yellow), 1-bp polymorphism (red), 2-bp (purple), 3-bp
(green), 4-bp (pink), and a 5-bp polymorphism (blue).
Supplementary Figure S8 Polymorphism of CRISPR repeat sequences in T. forsythia and T.
denticola
Polymorphism of the CRISPR repeat sequences is shown as the visualization by means of the
WebLogo software. (A) Repeats of the type rp30 (30-bp), rp47 (47-bp), and rp28 (28-bp)
CRISPRs in 19 strains of T. forsythia. (B) Repeats of the type rp27 (27-bp), rp23 (23-bp), and
rp36 (36-bp) CRISPRs in 5 strains of T. denticola. The horizontal axis indicates the position
of repeats and the vertical axis indicates the number of bits (log24 = 2 bits). The vertical line
on the upper side of each nucleotide character is the error bar.
Supplementary Figure S9 Classification of T. forsythia repeat sequences based on the
CRISPRmap database
(A) The repeat sequences of the T. forsythia CRISPRs are taxonomically classified and shown
in a hierarchical tree constructed using CRISPRmap: (i) repeats of type rp30 (30-bp), (ii) rp47
(47-bp), and (iii) rp28 (28-bp). The bacterial and archaeal domains are indicated by dark
brown and blue-green lines in the tree, respectively. The circles outside the tree indicate the
following: superclass (the outermost), sequence family (middle), and structure motif (the
innermost). (B) Polymorphisms in the CRISPR repeat sequences of the type rp30 against the
CRISPRmap database, as visualized using WebLogo. (C) A minimum free energy (MFE)
structure is shown for the types rp47 and rp28.
Supplementary Figure S10 Alignment of the sequences adjacent to the target regions of the
CRISPR spacers
In each CRISPR type, WebLogo figures are shown with the alignment of the sequences
adjacent to the target regions of the CRISPR spacers: (A) T. forsythia type I-B/rp30, (B) T.
forsythia type II/rp47 (KS16), and (C) T. forsythia no type/rp28. Each figure and alignment
are divided into the left and right part to indicate the alignments of the upstream and
11
265
270
275
280
285
20
downstream sequences of the targets, respectively.
Supplementary Figure S11 Clustering by spacer content in T. forsythia
The presence of each unique CRISPR spacer is shown by means of a heatmap in each T.
forsythia strain. The dendrogram is constructed from Euclidian distances. In the heatmap, 2
colors are used according to the bit score: red, ≥ 50; yellow, < 50.
Supplementary Figure S12 Similarity searches using CRISPR spacers of T. denticola
For the T. denticola CRISPR spacers, the presence of targets is shown as in Figure 3.
Supplementary Figure S13 A complementation ratio of the metabolic pathways and its fold
increase
(A) For the 29 main pathways, the complementation ratio among the 3 numbers of the species
(1, 2, and 3 species) is shown. The bars of either 1 or 2 species are the mean ratio of possible
combinations (1 species: T. forsythia, T. denticola, P. gingivalis; 2 species: T. forsythia and T.
denticola, T. denticola and P. gingivalis, P. gingivalis and T. forsythia). Additionally, the ratio
of the butyric acid pathway is shown (rightmost). (B) The proportions of the complementation
ratio are shown between the conditions where different numbers of the species are analyzed
(1: 2 species, 2: 3 species, 1: 3 species).
Supplementary Figure S14 Clustering by presence of the virulence genes
Presence of the known virulence genes and the novel virulence gene candidates is visualized
by drawing a heatmap, where the red and yellow boxes indicate the presence and absence of
the genes, respectively. The phylogenetic tree in T. forsythia to the left of the heatmap is the
same as that of the maximum likelihood method in Supplementary Figure S5.
Supplementary Figure S15 Complete genome sequences of T. forsythia KS16 and 3313
genomes
(A) KS16 and (B) 3313 genome sequences. Each circle indicates the following: nucleotide
positions (the outermost), forward and reverse CDSs (second and third level, respectively),
rRNA (fourth level), tRNA (fifth level), G+C% content (sixth level), GC skew (seventh level),
ISs (eighth level), MITEs (ninth level), CTns (tenth level), and CRISPRs (the innermost).
12
290
295
300
305
310
315
Supplementary Figure S16 Comparison of gene contents of 19 T. forsythia strains
Each circle represents the whole of the genes in each strain. The circles are overlapping to
show commonality and uniqueness of genes among the strains. The central area represents the
1,733 core genes of 19 T. forsythia strains. The areas that consist of only one circle show the
strain-specific genes in each strain, with the numbers of the genes.
Supplementary Figure S17 Genome structures of T. forsythia and the positions of MITE, IS,
CTn, CRISPR, and rRNA
Genome structures of 3 T. forsythia strains (ATCC 43037, KS16 and 3313) are shown as in
Figure 1. The positions of the mobile genetic elements (IS, MITE, and CTn), CRISPR, and
rRNA are shown along the 2 axes. The positions of rRNA operons in the rearrangement
breakpoints are indicated by the yellow dotted lines.
References
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al (2008). The RAST Server:
rapid annotations using subsystems technology. BMC Genomics 9: 75.
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J et al (2013). GenBank.
Nucleic Acids Res 41: D36-42.
Braham PH, Moncla BJ (1992). Rapid presumptive identification and further characterization of
Bacteroides forsythus. J Clin Microbiol 30: 649-654.
Chen Y, Zhou F, Li G, Xu Y (2009). MUST: a system for identification of miniature inverted-repeat
transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene
436: 1-7.
Chen L, Xiong Z, Sun L, Yang J, Jin Q (2012). VFDB 2012 update: toward the genetic diversity and
molecular evolution of bacterial virulence factors. Nucleic Acids Res 40: D641-645.
Crooks GE, Hon G, Chandonia JM, Brenner SE (2004). WebLogo: a sequence logo generator.
Genome Res 14: 1188-1190.
13
320
325
330
335
340
25
Darling AC, Mau B, Blattner FR, Perna NT (2004). Mauve: multiple alignment of conserved
genomic sequence with rearrangements. Genome Res 14: 1394-1403.
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999). Improved microbial gene
identification with GLIMMER. Nucleic Acids Res 27: 4636-4641.
Delcher AL, Salzberg SL, Phillippy AM (2003). Using MUMmer to identify similar regions in large
sequence sets. Curr Protoc Bioinformatics Chapter 10: Unit 10.13.
Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007). Identifying bacterial genes and
endosymbiont DNA with Glimmer. Bioinformatics 23: 673-679.
Dashper SG, Seers CA, Tan KH, Reynolds EC (2011a). Virulence factors of the oral spirochete
Treponema denticola. J Dent Res 90: 691-703.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy S, Bateman A (2005). Rfam: annotating
non-coding RNAs in complete genomes. Nucleic Acids Research 33: D121-D124.
Grissa I, Vergnaud G, Pourcel C (2007). CRISPRFinder: a web tool to identify clustered regularly
interspaced short palindromic repeats. Nucleic Acids Res 35: W52-57.
Holt SC, Kesavalu L, Walker S, Genco CA (1999). Virulence factors of Porphyromonas gingivalis.
Periodontol 2000 20: 168-238.
Huson DH, Bryant D (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol
Evol 23: 254-267.
Keane T, Creevey C, Pentony M, Naughton T, McInerney J (2006). Assessment of methods for amino
acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of
matrix are not justified. Bmc Evolutionary Biology 6.
Kichenaradja P, Siguier P, Pérochon J, Chandler M (2010). ISbrowser: an extension of ISfinder for
visualizing insertion sequences in prokaryotic genomes. Nucleic Acids Res 38: D62-68.
Lagesen K, Hallin P, Rodland E, Staerfeldt H, Rognes T, Ussery D (2007). RNAmmer: consistent and
rapid annotation of ribosomal RNA genes. Nucleic Acids Research 35: 3100-3108.
14
345
350
355
360
365
Lange SJ, Alkhnbashi OS, Rose D, Will S, Backofen R (2013). CRISPRmap: an automated
classification of repeat conservation in prokaryotic adaptive immune systems. Nucleic Acids Res 41:
8034-8044.
Makarova KS, Haft DH, Barrangou R, Brouns SJ, Charpentier E, Horvath P et al (2011). Evolution
and classification of the CRISPR-Cas systems. Nat Rev Microbiol 9: 467-477.
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C et al (2011).
CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 39:
D225-229.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007). KAAS: an automatic genome
annotation and pathway reconstruction server. Nucleic Acids Res 35: W182-185.
Noguchi H, Park J, Takagi T (2006). MetaGene: prokaryotic gene finding from environmental
genome shotgun sequences. Nucleic Acids Research 34: 5623-5630.
Ohtsubo Y, Ikeda-Ohtsubo W, Nagata Y, Tsuda M (2008). GenomeMatcher: a graphical user interface
for DNA sequence comparison. BMC Bioinformatics 9: 376.
Otto T, Dillon G, Degrave W, Berriman M (2011). RATT: Rapid Annotation Transfer Tool. Nucleic
Acids Research 39.
Pride DT, Salzman J, Haynes M, Rohwer F, Davis-Long C, White RA et al (2012). Evidence of a
robust resident bacteriophage population revealed through analysis of the human salivary virome.
ISME J 6: 915-926.
Sharma A (2010). Virulence mechanisms of Tannerella forsythia. Periodontol 2000 54: 106-116.
Salzberg SL, Delcher AL, Kasif S, White O (1998). Microbial gene identification using interpolated
Markov models. Nucleic Acids Res 26: 544-548.
Schattner P, Brooks A, Lowe T (2005). The tRNAscan-SE, snoscan and snoGPS web servers for the
detection of tRNAs and snoRNAs. Nucleic Acids Research 33: W686-W689.
Stamatakis A (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with
thousands of taxa and mixed models. Bioinformatics 22: 2688-2690.
15
370
375
380
385
390
Takemoto T, Kurihara H, Dahlen G (1997). Characterization of Bacteroides forsythus isolates. J Clin
Microbiol 35: 1378-1381.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011). MEGA5: Molecular
Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum
Parsimony Methods. Molecular Biology and Evolution 28: 2731-2739.
Tettelin H, Riley D, Cattuto C, Medini D (2008). Comparative genomics: the bacterial pan-genome.
Curr Opin Microbiol 11: 472-477.
van Dongen S, Abreu-Goodger C (2012). Using MCL to extract clusters from networks. Methods Mol
Biol 804: 281-295.
Van Horn KG, Smibert RM (1983). Albumin requirement of Treponema denticola and Treponema
vincentii. Can J Microbiol 29: 1141-1148.
Varani AM, Siguier P, Gourbeyre E, Charneau V, Chandler M (2011). ISsaga is an ensemble of web-
based methods for high throughput identification and semi-automatic annotation of insertion
sequences in prokaryotic genomes. Genome Biol 12: R30.
Watanabe T, Nozawa T, Aikawa C, Amano A, Maruyama F, Nakagawa I (2013). CRISPR Regulation
of Intraspecies Diversification by Limiting IS Transposition and Intercellular Recombination.
Genome Biol Evol 5: 1099-1114.
Yamada T, Letunic I, Okuda S, Kanehisa M, Bork P (2011). iPath2.0: interactive pathway explorer.
Nucleic Acids Res 39: W412-415.
Yang Z (1997). PAML: a program package for phylogenetic analysis by maximum likelihood.
Comput Appl Biosci 13: 555-556.
Zerbino DR, Birney E (2008). Velvet: algorithms for de novo short read assembly using de Bruijn
graphs. Genome Res 18: 821-829.
Zhou CE, Smith J, Lam M, Zemla A, Dyer MD, Slezak T (2007). MvirDB--a microbial database of
protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications. Nucleic
Acids Res 35: D391-394.
Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS (2011). PHAST: a fast phage search tool.
16
395
400
405
410
415
420
30
Nucleic Acids Res 39: W347-352.
17