Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Corrected Genome Annotations Reveal Gene Loss and Antibiotic Resistance as 1
Drivers in the Fitness Evolution of Salmonella Typhimurium. 2
3
Sandip Paul*, Evgeni V. Sokurenko, Sujay Chattopadhyay# 4
5
Department of Microbiology, University of Washington, Seattle, Washington, USA 6
7
Running Head: Fitness evolution of Salmonella Typhimurium genomes 8
9
#Address correspondence to Sujay Chattopadhyay, [email protected]. 10
11
*Present address: Structural Biology and Bioinformatics Division, CSIR-Indian Institute 12
of Chemical Biology, Kolkata 700032, India 13
14
15
16
17
18
19
20
21
22
JB Accepted Manuscript Posted Online 12 September 2016J. Bacteriol. doi:10.1128/JB.00545-16Copyright © 2016, American Society for Microbiology. All Rights Reserved.
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
2
ABSTRACT 23
Horizontal acquisition of novel chromosomal genes is considered to be a key process in 24
the evolution of bacterial pathogens. However, identification of gene presence or 25
absence could be hindered by the inconsistencies in bacterial genome annotations. 26
Here, we perform a cross-annotation of omnipresent core and mosaic accessory genes 27
in the chromosome of Salmonella enterica serovar Typhimurium, across a total of 20 28
fully-assembled genomes deposited into GenBank. Cross-annotation resulted in 32% 29
increase in the number of core genes and 3 fold drop in genes identified as mosaic (i.e. 30
present in some strains only) by the original annotation. Of the remaining non-core 31
genes, vast majority were of prophage nature and 255 of non-phage genes were 32
actually of core origin but lost in some strains upon the emergence of S. Typhimurium 33
serovar, suggesting that the chromosomal portion of S. Typhimurium genome acquired 34
a very limited number of novel genes other than prophages. Only horizontally-acquired 35
non-phage genes related to bacterial fitness or virulence were found in four recently 36
sequenced isolates, all located on three different genomic islands that harbor multi-drug 37
resistance determinants. Thus, extensive use of antimicrobials could be the main 38
selection force behind the new fitness gene acquisition and emergence of novel 39
Salmonella pathotypes. 40
41
42
43
44
45
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
3
IMPORTANCE 46
Significant discrepancies in the annotations of bacterial genomes could mislead the 47
conclusions about evolutionary origin of chromosomal genes, as we demonstrate here 48
via cross-annotation based analysis of Salmonella Typhimurium genomes from 49
GenBank. We conclude that despite being able to infect a broad range of vertebrate 50
hosts, the genomic diversity of S. Typhimurium strains is almost exclusively limited to 51
gene loss and transfer of prophage DNA. Only non-phage chromosomal genes acquired 52
after the emergence of the serovar are linked to the genomic islands harboring multi-53
drug resistance factors. Since the fitness factors could lead to increased virulence, this 54
poses an important research question: Could overuse or misuse of antimicrobials act as 55
selection forces for the emergence of more pathogenic strains of Salmonella? 56
57
58
59
60
61
62
63
64
65
66
67
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
4
INTRODUCTION 68
Horizontal gene transfer in bacteria is a major force in the adaptation to novel 69
environments, genome diversification and, in particular, in the evolution of bacterial 70
virulence (1-4). Horizontally transferred genes create a mosaic structure of the species 71
pan-genome, where the so-called accessory genes are present only in some strains, in 72
contrast to the core genes that are typically present in all strains (5-7). The 73
chromosomal accessory genes could be subdivided into the genes of phage origin and 74
the non-phage genes. The latter usually are acquired in form of genomic islands, 75
episomes or transposons that are often related to bacterial physiology, antigenic 76
diversification and, thus, fitness (4, 7, 8). In contrast, the phage genes are directly 77
related to phage physiology, although many prophages are also known to contribute 78
virulence and fitness factors to bacteria (9, 10). Distinguishing the accessory genes 79
from the core genes and defining their nature is of critical importance for understanding 80
the evolutionary dynamics and adaptive mechanisms of bacterial species. 81
Salmonella enterica subspecies enterica represents one of the most important 82
and widely distributed bacterial pathogens to both humans and domesticated animals 83
(11-14). Salmonella serovar Typhimurium represents a broad-host spectrum and one of 84
the most commonly isolated serovars from human, retail meats of diverse origins as 85
well as environment. While Typhimurium is primarily known to cause self-limiting 86
gastroenteritis, it also could be systemically invasive and exhibit multidrug resistance 87
(MDR) phenotype (15-17). Because Typhimurium serves as a complex serovar 88
involving multiple eco-/patho-vars, horizontal acquisition of genetic material might play 89
pivotal role in the emergence and evolution of different Typhimurium strains. However, 90
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
5
systematic analysis of the extent and nature of accessory genes in the Typhimurium 91
pan-genome has not yet been performed despite the availability of many fully-92
assembled high-quality genomes representing this serovar. 93
A major hindrance for the gene content analysis is inconsistency in the existing 94
annotations of even fully assembled, high-quality sequenced genomes, with genes that 95
are annotated in some strains but un-annotated or mis-annotated in other strains. Such 96
inconsistent and incomplete gene annotations can severely limit genome-wide 97
comparison and profiling of core and accessory fractions based on annotated genes, 98
proteins or functional clusters. Therefore, a lack of completeness and uniformity in 99
annotations considerably restrains the usefulness of availability of genome sequences 100
in understanding the mechanisms of bacterial evolution, especially those related to 101
pathogenesis. 102
Here, we compare gene content across chromosomal portions of the genomes of 103
eight Typhimurium strains that are fully-assembled and currently available in GenBank. 104
We detect that there are significant discrepancies in the annotation of, on average, 105
hundreds of genes per strain leading to highly misleading conclusions about the extent 106
of the Typhimurium genome openness to the gene transfer. Based on cross-annotation 107
of the existing annotated genomes, we here determine that the diversification of 108
genomes in the Typhimurium strains is almost exclusively driven by phages, and via 109
core gene loss. It appears that, since the emergence of the serovar, with exception of 110
one strain, none of the Typhimurium strains has acquired any novel genes that are not 111
directly related to the phage biology. This study indicates very little role of the 112
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
6
acquisition of non-phage genes in the fitness diversification of Typhimurium strains, and 113
highlights a critical need for the cross-annotation of existing bacterial genomes. 114
115
116
MATERIALS AND METHODS 117
Bacterial genomes and phylogeny. 118
Eight fully assembled genomes of Salmonella enterica Typhimurium (strains D23580, 119
798, ST4/74, T000240, UK-1, SL1344, LT2 and 14028S) were downloaded from NCBI 120
GenBank. For comparison, complete sequences of fully assembled genomes from 121
twelve other Typhimurium strains and fifteen non-Typhimurium strains were selected 122
and downloaded. Among the non-Typhimurium strains, except for two strains each from 123
Paratyphi A and Typhi serovars, the strains were clonally distinct according to MLST 124
analysis (http://mlst.ucc.ie) (see Fig. S1 in the supplemental material). 125
126
Pan-genomic analysis and re-annotation of genomes. 127
We used our recently developed tool PanCoreGen (18) that, apart from several other 128
features, generates pan-genomic profiles of chromosomal genes via re-annotations of 129
each annotated genome using rest of the annotated genomes as references. Based on 130
user-defined threshold values of nucleotide sequence identity and length-coverage, this 131
tool distinguishes each gene in the analyzed set of genomes as ‘core’ (i.e. present in all 132
genomes of the analyzed dataset), ‘mosaic’ (i.e. present in multiple but not all 133
genomes), or strain-specific (i.e. present only in one of the annotated genomes). We 134
applied PanCoreGen tool on 8 annotated chromosomes of S. enterica Typhimurium to 135
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
7
create the pan-genomic profile of serovar Typhimurium. For BLAST (blastn) search of 136
orthologs we used 95% nucleotide sequence identity and gene-length coverage as the 137
lower limit. All the analyses were restricted to the chromosomal genes, not considering 138
the plasmids. We found the pan-genome size of 5936 genes of which 5399 genes were 139
core. The gene distribution for each genome resulting from the pan-genomic profile was 140
used for re-annotation. We re-annotated each genome based on the rigorous following 141
steps: 142
a) Each gene found by PanCoreGen for a genome was checked whether it was already 143
annotated or not in the existing gene annotations for that genome. We used the BLAST 144
with 100% sequence identity and at least 50% length coverage to be considered as a 145
newly annotated gene. A newly annotated gene might be either completely unannotated 146
previously or with a partial annotation where the gene-length was less than half of the 147
length in new annotation. 148
b) All newly annotated genes were included only if no premature stop codons were 149
present therein. Otherwise, the genes were discarded to avoid inclusion of 150
pseudogenes. 151
c) We checked all the newly annotated genes by BLASTing (blastn) them against all 152
annotated pseudogenes in 8 Typhimurium strains with constraints of 95% sequence 153
identity and 20% length coverage. This would avoid inclusion of smaller fractions of any 154
pseudogene annotated as open reading frame (ORF) in some genomes, along with the 155
pseudogenes that did not accumulate any premature stop codons (e.g., disabled genes, 156
or unitary pseudogenes). 157
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
8
d) All the partially annotated genes were detected with further orthology search (100% 158
sequence identity and 10% length coverage) among previously annotated genes. 159
160
Pan-genomic profiling. 161
We created the pan-genomic profile of 8 Typhimurium strains via serial inclusion of n 162
genomes (where n goes from 1 through 7) using 8 random combinations for n=2,3,…7. 163
This profile was generated for three sets, genomes with existing GenBank annotations, 164
genomes after re-annotation and re-annotated genomes without prophage regions. 165
Using Prism software, we performed least squares curve-fitting based on power law n = 166
κ Nγ to medians. The exponent γ≈0 indicates a closed nature of the pan-genome (19). 167
168
Phage region identification. 169
In each of 8 Typhimurium strains, we identified prophage sequences with PHAST 170
(phage search tool, available at http://phast.wishartlab.com/ (20)) Web Server by 171
uploading the re-annotated GenBank formatted files. We considered all the regions 172
identified as “intact”, “incomplete” or “questionable” by PHAST to be the probable 173
prophage regions from all the strains under study. We also considered the phage genes 174
annotated in each of the existing GenBank files. The orthologous sequences of these 175
phage genes were extracted across all the Typhimurium strains under study. 176
177
178
179
180
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
9
Functional enrichment analysis. 181
We used DAVID software (21) for clustering based on protein functions of DT12 182
genomic island genes of the strain T000240. The classification stringency was set to 183
‘medium’ for the analysis. 184
185
186
RESULTS 187
Large fraction of un-annotated genes in genomes of Typhimurium strains. 188
We analyzed gene presence/absence content in the chromosomal portion of fully-189
assembled genomes of eight archetypal Typhimurium strains – 14028S, 798, D23580, 190
LT2, SL1344, ST474, T000240 and UK-1. Based on seven housekeeping loci that are 191
used for multi-locus sequence typing, these stains formed a single tight clade within 192
Salmonella enterica subspecies I (see Fig. S1 in the supplemental material). The 193
genome size variability of the Typhimurium strains ranged from 4.82 Mb (UK-1) to 4.95 194
Mb (T000240), with only 0.81±0.14% average pairwise difference between the strains. 195
In contrast, based on the GenBank annotation of protein-coding genes, the number of 196
genes in the Typhimurium genomes varied between 4,326 (strain 798) and 5,323 genes 197
(14028S), with 7.14±1.36% average pairwise difference between the strains (Fig. 1, 198
grey bars). The number of annotated genes in individual genomes did not correlate 199
(R2=0.03) with the size of genomes (see Fig. S2 in the supplemental material). 200
We re-annotated protein-coding genes in each Typhimurium strain by cross-201
annotating their genomes using recently developed PanCoreGen software (18). After 202
the cross-annotation, the pairwise difference in the gene content between the strains 203
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
10
went down 5 fold, to 1.43±0.26% (P<0.0001). Average gene content per genome 204
increased from 4,600±112 in GenBank annotated genomes to 5,430±26 genes after the 205
cross-annotation, which is higher than the number of originally annotated genes in the 206
genome of strain 14028S with the highest number of genes according to the GenBank 207
(Fig. 1, black bars). The median length of ORFs missed by the original annotations was 208
relatively small and ranged from 132 to 147 bp (see Table S1 and Table S2 in the 209
supplemental material). However, each re-annotated genome had on average 34 newly 210
annotated genes that were ≥ 300 bp long. The longest such gene was ftsK (4,086 bp) 211
encoding DNA translocase that was missed by original annotation in strain UK-1. 212
Importantly, after the cross-annotation, the number of genes per genome was well 213
correlated (R2=0.86) with the overall size of corresponding genomes (see Fig. S2 in the 214
supplemental material). 215
Thus, there were substantial discrepancies in the GenBank annotations of fully 216
assembled genomes of eight archetypal Typhimurium strains. While the discrepancies 217
mostly involved small size ORFs, this resulted in underestimation of the gene number in 218
every genome as well as overestimation of the differences in gene content between the 219
strains. 220
221
Significant reduction in the number of non-core genes after cross-annotation. 222
We next assessed the number of the ‘core genes’ present in all eight strains, ‘mosaic 223
genes’ present in multiple but not all strains, and ‘strain-specific genes’ present in only 224
one of the strains. 225
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
11
Based on the GenBank annotations, there were 4,056 core, 753 mosaic and 226
1,185 strain-specific genes (Fig. 2A), with the chicken-derived strain 14028S showing 227
the highest number of strain-specific genes (718 genes). Thus, the core genes 228
represented only 67.7% of all chromosomal genes (pan-genome) and, on average, 229
comprised 88.5% of individual genomes. Furthermore, according to the original 230
annotations, the Typhimurium serovar appeared to have an ‘open genome’, where the 231
pan-genome size was notably increasing with the increase in number of genomes 232
analyzed (Fig. 3A). The curve-fitting yielded the exponent γ value of 0.16±0.02 that was 233
significantly above zero (γ=0 is indicative of a completely ‘closed genome’; see Methods 234
for details). 235
After the cross-annotation by PanCoreGen, the number of the Typhimurium core 236
genes increased 1.3 fold – to 5,348 genes, while the number of mosaic and strain-237
specific genes dropped about 3 fold – to 292 and 343 genes, respectively (Fig. 2B). The 238
relative distribution of strain-specific genes also changed significantly. Only 25 genes 239
were unique in the strain 14028S, while the highest number of strain-specific genes was 240
in the multi-drug resistant strain T000240 (190 genes) followed by the systemically 241
invasive strain D23580 (77 genes). On the other end, the calf salmonellosis strain 242
ST474 and the model gastroenteritis strain SL1344 (which is ST474 auxotroph 243
derivative), did not have any genes uniquely present in them. In contrast to the original 244
annotation, 98.5% of genomes content, on average, were of core origin as we 245
performed cross-annotation. Overall, the core genes comprised a significantly larger 246
portion of the pan-genome (89.4%) than before the cross-annotation. As a result, the 247
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
12
‘openness’ of the Typhimurium serovar’s pan-genome became significantly less obvious 248
(Fig. 3B), with the γ value decreasing more than three-fold (γ=0.05±0.01, P<0.0001). 249
250
Thus, cross-annotation of the Typhimurium strains’ genomes significantly 251
increased the proportion of core genes in both the serovar’s pan-genome and individual 252
genomes. While after the cross-annotation the openness of Typhimurium genome 253
became much less obvious, it is not completely closed, indicating certain level of 254
horizontal gene movement within the serovar. The remaining of the study was done on 255
the re-annotated Typhimurium genomes. 256
257
Non-core predominance and active transfer of prophages in the serovar. 258
We determined the genes of prophage origin based on existing annotations of phage 259
genes and also by using the PHAST program (20) to predict the prophage clusters. 260
Among the genes that were newly annotated by PanCoreGen at least in one genome, 261
about 10% were of prophage origin (not shown). Out of 922 prophage genes identified 262
in the cross-annotated pan-genome, 456 were of core nature, 238 mosaic and 228 263
unique (Fig. 4A). Thus, the prophage genes comprised only 8.5% of the core genes, but 264
81.5% of mosaic and 66.5% of strain-specific genes. Also, without the prophage genes, 265
the core genes comprised 96.5% of the pan-genome and 99% of individual genomes on 266
average. After exclusion of the prophage genes from the analysis of the genome 267
openness, the γ value dropped further more than 2 fold (γ=0.011±0.003, P<0.0001 in 268
Fig. 3C), being only marginally above zero. 269
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
13
Ninety-eight percent of the prophage genes were located on the chromosome in 270
19 clusters, each incorporating at least 2 genes, with a median size of 28 genes (see 271
Table S3 in the supplemental material). Ten of the clusters were designated by PHAST 272
as intact prophages and contained 68% of all phage genes. Another 6 clusters, with 157 273
genes, were designated as incomplete prophage regions. The remaining 3 clusters with 274
118 genes were designated by PHAST as questionable. 275
In order to understand the evolutionary origin and dynamics of phage genes we 276
have analyzed in detail their nature, strain distribution and, in particular, presence in the 277
S. Heidelberg strain SL476. This Heidelberg strain was chosen as the closest relative 278
based on evolutionary distance of the MLST sequences across serovars (see Fig. S1 in 279
the supplemental material). 280
Out of 10 intact prophages, only φGifsy1 and φGifsy2 were found in all 281
Typhimurium strains (Fig. 5A), suggesting their ancestral nature in the serovar. 282
Interestingly, the intact core prophages were either absent (φGifsy1) or only partially 283
present (φGifsy2) in the Heidelberg strain. In contrast, 5 intact prophages were found 284
only in one strain – φGifsy3 (strain 14028S), φFELS1 (LT2), φST104 (T000240), φBTP1 285
and φBTP5 (D23580). Detailed examination of the chromosomal regions corresponding 286
to the insertion sites of the strain-specific prophages showed no remnant scars in the 287
Typhimurium strains that lack these phages or in S. Heidelberg, strongly suggesting that 288
they were acquired after the serovar had emerged. 289
Three remaining intact prophages – φST64B, φFELS2 and φFELS2-like – had the 290
mosaic distribution, i.e. were found in multiple but not all strains. None of these phages 291
or their remnants was present in the Heidelberg strain. φST64B was present in all but 292
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
14
LT2 and T000240 strains. However, a close examination detected remnant sequences 293
of φST64B in the corresponding position in the latter strains, suggesting that φST64B 294
was originally present in LT2 and T000240 too, but was lost. Another intact mosaic 295
prophage, φFELS-2, was present only in LT2 and T000240. However again, in 14028S, 296
UK-1 and D23580 the remnant scars were present, suggesting φFELS-2 loss from the 297
strains. Interestingly, in strains SL1344, ST474 and 798 there was another prophage in 298
the position of φFELS-2 prophage. This phage had 44% genes homologous to φFELS-2 299
(based on 95% sequence identity and length coverage between SL1344 and LT2) and, 300
thus, was designated previously as φFELS-2–like prophage. Therefore, both φST64B 301
and φFELS-2 phages appeared to be originally core phages but underwent partial loss 302
or replacement, while φFELS2-like was likely a result of insertion event later during the 303
evolution of the serovar. 304
Analysis of phage clusters designated as incomplete indicated that 4 of them 305
were clearly of core nature (not shown). Interestingly, this determination could be done 306
only after cross-annotation of the genomes as the GenBank annotation missed them in 307
multiple strains, again causing some misleading conclusions. Other 18 phage genes 308
were not clustered and all of them were detected as core being present in the same 309
locations across the genomes. 310
Thus, prophage genes comprise the largest portion of non-core genes and, 311
without them, the Typhimurium pan-genome appears almost completely closed. 312
Altogether, the comparative map of clusters of phage origins suggests a continuous 313
acquisition and loss of phage materials across the Typhimurium strains. 314
315
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
15
Evolutionary origin of the non-phage accessory genes. 316
Upon removal of the prophage genes and genes from the repeat regions, 51 genes 317
remained in the mosaic category and 115 genes in strain-specific category, with all but 318
1 of the latter found in strain T000240 (Fig. 4B). To analyze the origin of non-phage 319
genes we analyzed their nature in detail, again using S. Heidelberg strain SL476 as the 320
closest relative of the Typhimurium clade in the MLST phylogeny (see Fig. S1 in the 321
supplemental material). 322
Among the 51 mosaic genes of non-phage origin, 34 genes were located in 7 323
clusters of 2 or more genes (see Table S4 in the supplemental material). All of these 324
clusters, however, were designated as mosaic by being absent only in 1 or 2 strains 325
(Fig. 5B) – del_1, del_2 and del_3 were absent only in T000240; del_4, del_5 and del_6 326
were absent only in D23580; and del_7 was absent in UK-1 and 14028S. Moreover, in 327
the Heidelberg strain all of the mosaic gene clusters were present in full and at 99% 328
nucleotide sequence identity, except del_7, of which only 59% of the cluster was 329
identical in Heidelberg strain at the identity level of 98% or above. Close examination of 330
the remaining 17 mosaic genes that were not in clusters have shown that they were 331
actually present in all strains (i.e., were of core nature), but in certain strains their copies 332
were either truncated more than 5% due to the either insertional disruption or partial 333
deletion (see Table S4 in the supplemental material). This led PanCoreGen to miss their 334
actual presence in some strains during the re-annotation using 95% cut-offs for both 335
nucleotide sequence identity and gene-length coverage. Also, 16 of these genes were 336
present in full length in the Heidelberg strain as well, suggesting their presence to be 337
ancestral to Typhimurium. Thus, it appears that non-phage genes that were designated 338
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
16
as mosaic in nature were so due to either their loss in clusters or partial truncation in 339
some Typhimurium strains rather than a de novo acquisition. 340
As mentioned above, only two strains were found to have strain-specific genes of 341
non-phage origin – 114 such genes in strain T000240 and only 1 gene in strain 798. 342
The strain 798-specific gene was 2097 bp long rnfC encoding an electron transport 343
complex protein. However, further analysis detected that rnfC was essentially a core 344
gene present in all Typhimurium strains, with the rest of strains carrying a longer 2208 345
bp gene version. Due to the length difference, PanCoreGen mistakenly designated the 346
shorter version as specific to strain 798 and the longer one as mosaic in the other 347
strains. Interestingly, however, the Heidelberg strain had the longer gene, while the 348
shorter version was found in several other serovars such as Enteritidis, Gallinarum, 349
Pullorum (not shown). 350
Detailed examination of the 114 strain-specific genes in T000240 showed that 351
100 genes were clustered in a single 82kb chromosomal island GI-DT12 identified 352
previously (17). The insertion site of GI-DT12 was in the middle of a putative regulatory 353
gene STM14_4564 (as annotated in strain 14028S). This gene remained intact, without 354
any ‘scars’, in the rest of Typhimurium strains as well as in the Heidelberg strain SL476. 355
This strongly suggested that the island was acquired by strain T000240 rather than lost 356
by others. 357
Of the remaining 14 T000240-specific genes, 12 genes were actually 358
represented by 6 identical copies of two overlapping genes (504 and 276 bp long) 359
encoding IS1 transposases that propagated across the genome, disrupting different 360
gene regions and, thus, resulting in some genes being mosaic in nature. Two other 361
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
17
T000240-specific genes, 309 bp long STMDT12_C29860 and 387 bp long 362
STMDT12_C26860 encoded a hypothetical protein. The former one replaced 117 bp 363
long hypothetical gene present in the rest of the strains except LT2, while the latter was 364
detected immediately upstream of T000240-specific ‘questionable’ phage cluster Q3 365
(see Table S3 in the supplemental material). Thus, together with the island genes, 14 366
other T000240-specific genes also appear to be acquired horizontally by the T000240 367
strain. 368
Altogether, it is evident that only one Typhimurium strain analyzed - T000240 – 369
had undergone genomic acquisition events that did not involve prophages, primarily the 370
82 kb genomic island. The rest of the non-phage genes that were originally defined here 371
as mosaic or unique in the Typhimurium strains were actually core genes either 372
completely or partially lost in some strains after the serovar had emerged. 373
374
Identification of complete genomic core of the Typhimurium serovar. 375
Based on the analysis of eight cross-annotated genomes presented above, we could 376
define that, upon exclusion of prophage genes, the ancestral (‘total’) chromosomal core 377
of Typhimurium serovar is comprised of 4,944 genes, including omnipresent 4,892 378
genes (‘stable core’) and 52 partially lost genes (‘unstable core’). 379
To assess the completeness of the genomic core identified, we added to the 380
analysis twelve more completely sequenced Typhimurium genomes that became 381
available in the GenBank in the course of this study (see Table S5 in the supplemental 382
material). Upon cross-annotation of all twenty genomes together, 4694 genes remained 383
omnipresent, while 198 genes of the original stable core were found to be completely or 384
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
18
partially lost from some of 12 genomes. 84% of all those genes were in a total of 8 385
clusters (of 2 or more consecutive genes), with the largest deletion event happened in 386
strain VNP20009, a modified derivative of 14028S (220), involving 102 genes across 387
108 kb region. In this new set of genomes, we also found the presence of 2 of 7 deletion 388
events noted previously for the set of 52 mosaic genes. The event del_7 appeared to 389
happen in the common ancestor of UK-1, 14028S and VNP20009. Interestingly, del_4 390
which was earlier detected in MDR and invasive strain D23580 was also present in two 391
MDR strains DT104 and 138736. 392
Upon the cross-annotation of twenty genomes of Typhimurium, 29 new genes 393
were added to the total core. All of them were missed in the original set of 8 genomes 394
because of their absence in the annotations rather than genes per se. Of these newly 395
annotated genes, 24 genes were omnipresent (stable core) and remaining 5 genes 396
showed deletion in some of the genomes. 397
Thus, the originally defined set of core genes proved to be highly representative 398
for the Typhimurium serovar. Upon the cross-annotation of twenty fully assembled 399
genomes deposited in GenBank, the non-phage total core of Typhimurium stands at 400
4973 genes, with the omnipresent stable core comprised of 4718 genes (95%) and 401
partially lost unstable core of 255 genes (5%), with the latter represented by 15 to 19 402
strains. 403
404
Insertion of genomic islands – limited, but all with MDR and fitness genes. 405
As described above, in the originally-analyzed set of eight genomes, horizontally-406
acquired set of genes was found only in one strain - the MDR strain T000240 with the 407
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
19
100-gene genomic island DT12. Functional enrichment analysis (see Table S6 in the 408
supplemental material) of DT12 island revealed genes coding for resistance to 409
chloramphenicol, sulfonamides, tetracycline, etc (i.e., bla(oxa-30), aadA1, qacEΔ1, and 410
sul1, cat, and tetA) as well as for fitness- and, potentially, virulence-associated genes 411
like aerobactin iron-acquisition siderophore system (lutA, lucABC) and iron transporter 412
(sitABCD). 413
Analysis of the twelve additional genomes of Typhimurium revealed 2 additional 414
genomic islands in 3 strains, both of which were reported previously (23, 24). One of 415
them was GI-VII-6, a 125 kb island, which was incorporated in a MDR strain L-3553 416
isolated from cattle in Japan, and was found to code for several antibiotic resistance 417
genes (aadA, strA, strB, sul1, sul2, tetA. floR, dfrA12) along with blaCMY-2 gene (for 418
extended spectrum cephalosporin resistance). In addition, this island genes coded a 419
number of transcriptional regulators (STL3553_RS05185, STL3553_RS05225, 420
STL3553_RS05270, STL3553_RS05280) as well as siderophore transporter proteins 421
(STL3553_RS05320) that likely affect the overall fitness and, potentially, virulence of 422
the strains. Identical copies of another genomic island, SGI-1 (43 kb), were found in two 423
closely related strains, DT104 and 138736. This island again coded genes responsible 424
for the MDR phenotype of both these strains – ampicillin, chloramphenicol, 425
streptomycin, sulfonamides, tetracycline, fluoroquinolones, etc. – that was attributed to 426
a 13 kb region of the island carrying MDR genes (e.g. aadA2, floR, tetG, pse-1, sul1, 427
etc.). Here also we found some potential fitness-related genes encoding transcriptional 428
regulators (e.g. tetR, orf1,) and recombination proteins (e.g. tnpA, tnpR, int1, orf2). 429
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
20
The only other set of non-core non-phage genes identified in the newly analyzed 430
genomes were 24.6 kb genomic insertion in position between the genes 431
CFSAN001921_RS03395 and CFSAN001921_RS03550 of another MDR strain 432
CFSAN001921 (25). In the insertion, 13 out of 27 genes were of hypothetical origin, 433
with none of the remaining genes being related to known resistance or fitness factors 434
but were primarily related to the genomic stability cassettes. Two of the genes were 435
annotated to encode phage functions and the insertion itself was located in the insertion 436
site of Fels-2 prophage of LT2. Thus, the phage origin of the 24.6 kb genomic insertion 437
could not be excluded. 438
Altogether, the cross-annotation analysis of twenty genomes of Typhimurium 439
strains revealed only 3 genomic regions (in 4 strains) clearly of non-phage origin being 440
horizontally transferred, with all of them carrying multiple antibiotic resistance as well as 441
fitness-related genes. 442
443
444
DISCUSSION 445
In this study we eliminated discrepancies in the annotation of genes in twenty fully-446
assembled genomes of Salmonella Typhimurium strains deposited in GenBank. We 447
performed cross-annotation using PanCoreGen software package to add, on average, 448
hundreds of genes to each genome that were missed by the original annotation and to 449
identify genes present in all, some or only one of Typhimurium strains. We determined 450
that chromosomal portion of the Typhimurium genome is highly conserved with limited 451
influx of novel fitness-related genes via horizontal transfer, except for the horizontal 452
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
21
movements of phage clusters. Since the serovar had emerged, only few non-phage 453
horizontal transfer events could be recorded, all of them related to MDR phenotype. 454
However, the acquisition of antibiotic resistance brings genes that might change fitness 455
and, potentially, virulence of the strains. Besides, the fitness evolution in Typhimurium 456
could also be driven in certain strains by complete or partial loss of some of original 457
core genes. 458
High diversity of gene content in Salmonella enterica ss. enterica genomes 459
reflects considerable extent of horizontal gene transfer events. Previous works have 460
demonstrated the importance of gene acquisition in virulence, antibiotic resistance, 461
novel metabolic pathways and other adaptive traits. Gene acquisition in form of islands 462
or as part of phages has been suggested to play key role in emergence of different 463
Salmonella serovars. For example, genes harboring an array of genomic islands and 464
prophages (SPI-6, Gifsy-1, Gifsy-2, etc.) are important for intracellular replication of 465
Typhimurium serovar strains (26). SPI-12, encoding a remnant phage, in Typhimurium 466
includes at least four genes for transcriptional activation/regulation, fitness, thereby 467
facilitating bacterial survival in the host (27). Both SPI-1 and SPI-2 encode type III 468
secretion system (T3SS) involved in translocation of virulence proteins into host cells, 469
while multiple T3SS effectors are part of SPI-5. Besides, SPI-3 and SPI-4 promote 470
intestinal colonization via MisL (a member of autotransporter protein family) and SiiE (a 471
non-fimbrial adhesin as part of type I secretion system), respectively. All these insertion 472
events are known to be ancestral to Typhimurium serovar. However, whether or not 473
gene transfer plays a significant role in the genome and fitness diversification after the 474
serovars had emerged has not been studied in detail. 475
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
22
Here, we investigated the intra-serovar evolution of Salmonella Typhimurium that 476
arguably exhibits the broadest host and pathogenicity range among all serovars of the 477
S. enterica ss. enterica. This serovar is able to infect both livestock and human hosts 478
(e.g., pathovar DT104) as well as exhibiting a restricted host range (e.g., pathovars DT2 479
and DT99) (28). Importance and diversity of the Typhimurium serovar is reflected in the 480
number of strains with sequenced genomes of a high-quality assembly that were 481
deposited in the GenBank. This includes one of the first Salmonella strains sequenced 482
15 years ago – of the model strain LT2 (12) – as well as strains from invasive human 483
infections (D23580, 33676, YU39) (15, 29, 30), or harboring MDR phenotype and 484
isolated from various animals (CFSAN001921, L-3553, etc.) (25, 31). 485
One could expect, that the large time period over which the Typhimurium 486
genomes had been obtained and the diversity of research groups who contributed them 487
to the public domain would certainly result in differences of the genome annotations. 488
First of all, algorithms for the recognition of ORFs have been evolving constantly and 489
different research groups could use different stringency criteria (like a minimal sequence 490
length) of defining an ORF as a functional gene. Second, in some genomes, but not 491
others, certain genes could be absent due to either gene deletion or, alternatively, 492
horizontal acquisition. Third, nonsense or frame-shift mutations, small deletions and 493
insertional disruptions could make certain genes missed by the annotation. Finally, use 494
of different reference genomes and/or annotation databases could lead to giving same 495
gene different name or hypothetical function status. One way or another, all this could 496
lead to significant differences in annotation of different genomes that could be just mere 497
artifacts, with many genes missed or given various names. By far, this problem is not 498
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
23
limited to the Salmonella Typhimurium and likely to be a general problem in bacterial 499
genomics. 500
A couple of recent pan-genomic profiling of Salmonella enterica were based on 501
construction of orthologous groups (32), or on de novo annotation of genes (14). Over 502
the last decade, a number of powerful analysis tools have been developed for pan-503
genomic profiling that can successfully identify homologous genes and their core, 504
mosaic or strain-specific nature in a given dataset, along with plotting and visualization 505
of the profiles, both sequence and function-based annotation and curation of genomes, 506
reconstructing phylogenetic relationships of orthologous genes/families, etc. (33, 34). 507
However, an important aspect yet to be integrated in the existing pan-genomic analysis 508
approaches appears to be the information of gene gain and gene loss during specific 509
lineage evolution, i.e. to differentiate the accessory genes in the pan-genome derived 510
via horizontal gene acquisition from the ‘ancestrally core’ genes that have been lost 511
from specific strains/lineages over the course of evolution and thus are part of the 512
accessory fraction (which we designate here as unstable core genes). Such information 513
in the assortment of genes would provide insights on possible functional adaptation via 514
pan-genomic evolution. Here we consider each annotated Typhimurium genome as 515
reference relying on existing annotations to perform cross-annotation among annotated 516
genomes to identify unannotated genes, and finally reconstruct the pan-genome based 517
on uniformly re-annotated individual genomes. We identify a total of 4711 genes found 518
in all strains (stable core) and 255 genes found in most isolates (unstable core) in the 519
analyzed Typhimurium strains. The determination of the total core gene-set that was 520
originally present when the serovar had emerged offers the research field with 521
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
24
evolutionary ‘Eve’ of Typhimurium strains that could be used for annotation of newly 522
sequenced genomes and better understanding of their evolutionary dynamics. 523
Upon cross-annotation of originally analyzed eight, and then twelve additional 524
genomes, it has become clear that the accessory fraction of the Typhimurium genomes 525
is predominantly contributed by continuous inflow and outflow of phage elements. In 526
absence of the horizontal transfer of phage clusters, Typhimurium serovar would have 527
considered to have almost a closed pan-genome. Though we did not analyze here the 528
plasmid diversity of the Typhimurium strains, whether or not plasmids play a significant 529
role in the adaptive evolution of Salmonella Typhimurium, it is astounding that the 530
serovar’s genome appears to be so restricted to the acquisition of non-phage genomic 531
islands by horizontal transfer. Also, there is always a possibility that a genomic island, 532
instead of being a result of true horizontal transfer event, could be a shift from an 533
unanalyzed plasmid, thereby leading to an even more closed structure of Typhimurium 534
pan-genome. Keeping in mind the broadest host and pathogenicity range of 535
Typhimurium among all S. enterica serovars, one can suggest a possibility of similar 536
scenarios for other S. enterica serovars as well. This is to a great contrast, for example, 537
to continuous movement of genomic islands in E. coli strains that belong to the same 538
serotypes and/or multi-locus sequence types (STs). Those closely-related clonal groups 539
might be considered equivalent to specific serovars of Salmonella from perspectives of 540
population and evolutionary genetics. For example, a single avian-pathogenic E. coli 541
strain APECO1 from ST95 has been shown to harbor 43 genomic islands that differ 542
greatly in content from strains representing same serotype and/or ST (35). Also, E. coli 543
strains of multi-drug resistant clonal group ST131 considerably vary in acquisition of 544
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
25
several genomic islands that carry various pathogenicity-relevant (and no antibiotic 545
resistance) genes (36). Another example is ST73 that combine strains with highly-546
diverse genomic content and include model uropathogenic strain CFT073, 547
asymptomatic bacteriuria strain 83972 and probiotic strain Nissle 1917 (37, 38). This 548
might suggest strong physiological barriers in Salmonella compared to E. coli. 549
Alternatively, limited genomic diversity in Typhimurium could also be the result of 550
relatively recent emergence of this serovar that did not allow enough time for the 551
frequency of horizontal transfer events to be comparable with some of the E. coli 552
serotypes or STs having much older origin. Therefore, additional studies on accurate 553
molecular clock estimation of the clonal groups’ age are warranted to understand the 554
basis of difference in the gene transfer rates in different species. 555
Majority of the prophage gene clusters seemed to be comprised of genes related 556
to the phage movement, biogenesis and structural components. However, this does not 557
mean that phage acquisition or loss could not have adaptive effect on the Typhimurium 558
strains, including their virulence. On one hand, disruption of the genomic region of the 559
prophage insertion could affect the function of nearby genes. Interestingly, two different 560
phage clusters – φST104 and φBTP1 – are inserted into the same spot in the MDR 561
strain T000240 and the systemically invasive strain D23580, respectively. In both 562
strains, the insertion has happened downstream of proA gene (encoding gamma-563
glutamyl phosphate reductase) and upstream of IS3 transposase that is itself positioned 564
immediately upstream of the stb fimbrial gene cluster. If the function of the surrounding 565
genes is affected one way or another by the phage insertion, this could have significant 566
effects on the bacterial physiology. On another hand, presence of prophages in the 567
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
26
bacterial chromosome could have a pleiotropic effect on the expression of chromosomal 568
genes by, for example, transcriptional cross regulation. Thus, these phage might have a 569
similar effect on corresponding host strains not only by being inserted into the same 570
genomic site but also by having similar in trans effect. It is noteworthy that we detected 571
presence of phage cluster exactly at the same location as of φST104 and φBTP1 (in 572
T000240 and D23580, respectively) exclusively in all the MDR and/or invasive ones of 573
additional 12 Typhimurium strains analyzed (i.e. U288, CFSAN001921, DT104, 138736, 574
L-3553, 33676, YU39). In remaining 11 non-MDR and/or non-invasive strains, this 575
genomic location remained uninterrupted where proA was immediately followed by IS3 576
transposase. Insertion of different phage clusters suggested a hotspot nature of this 577
genomic region allowing independent acquisition of phage genes by MDR/invasive 578
strains of this serovar. However, a detailed experimental and population studies are 579
warranted to determine any direct or indirect role of prophage genes in Typhimurium 580
physiology, virulence and/or drug resistance phenotypes. 581
Detailed analysis of the gene presence/absence in the cross-annotated 582
Typhimurium strains found that about 5% of original core genes were completely or 583
partially lost in some strains after the serovar emerged. Gene loss by deletion or 584
formation of pseudogenes is thought to be driven by two contrasting evolutionary 585
processes. On one hand, genes can be lost as part of ‘use-or-lose’ evolutionary 586
dynamics, i.e. if the function of those genes are not needed anymore for a particular life-587
style of bacterial strain. Such loss is accumulated as result of genetic drift, i.e. via 588
random events not driven by positive selection. On another hand, gene loss could be 589
driven by ‘die-or-lose’ evolutionary mechanism that removes genes, functions of which 590
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
27
reduce fitness in certain environments. In the course of clinical infection, for example, 591
some genes might interfere with the expression or function of virulence-promoting 592
factors or increase liability of pathogen by expressing traits recognizable by the host 593
defenses. 594
Complete or partial loss of core genes was noted primarily in invasive strain 595
D23580, MDR strain T000240, and in two closely related strains – UK-1 (called 596
‘universal killer’ for its high invasion and virulence properties) and 14028S (15, 17, 39, 597
40). This highlights a possible central role of gene inactivation in the evolution of fitness 598
diversity of Salmonella Typhimurium strains. Interestingly, previous work has suggested 599
an adaptive convergence of the loss del_6 cluster from systemically invasive 600
Typhimurium D23580 with the same event in systemically invasive serovar Typhi (15). 601
However, we detect here that all non-Typhimurium serovars (invasive and non-invasive) 602
show either partial or complete loss or truncation of genes in this cluster (see Fig. S3 in 603
the supplemental material). Also, earlier work demonstrated that disruption of stbC gene 604
in stb fimbrial operon could lead strain LT2 to a flagellated strain exhibiting constitutively 605
mannose-sensitive agglutinating and multi-drug resistant phenotype (41). Interestingly, 606
we detect that, although stbC gene has remained intact, other genes of the stb fimbrial 607
operon such as stbB, stbA etc. are lost in the MDR strain T000240 due to absence of 608
del_1 cluster. It would be worth finding out if the loss of these genes might have 609
somehow attributed to the resistance phenotype of this strain. Our analysis of additional 610
12 strains of serovar Typhimurium revealed loss of a few other ancestrally core gene-611
clusters in some of them (see Table S7 in the supplemental material). However, 612
understanding of the functional and adaptive significance of the gene loss and 613
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
28
inactivation in Typhimurium strains is beyond this study and will require expanded 614
analysis of Typhimurium and non-Typhimurium strains as well as experimental studies. 615
In the original set of eight genomes, the only true transfer of gene-cluster of non-phage 616
origin is the T000240-specific genomic island GI-DT12 that possesses determinants of 617
antibiotic resistance, mercury resistance, iron acquisition, heavy metal tolerance etc. 618
Previous work demonstrated that this island has most likely helped the isolate to adapt 619
to adverse environmental conditions like extremely polluted sewage (17). In the 620
additional twelve genomes, only two more genomic islands (in strains DT104, 138736 621
and L-3553) were identified that also carried both MDR and fitness genes. While we 622
identified one more potential island (in another MDR strain CFSAN001921) of obscure 623
origin and function, our observations strongly indicate that, in the course of evolution of 624
the Typhimurium serovar, antibiotic resistance might be the major selection factor 625
behind the acquisition of genomic islands that also bring some fitness associated genes 626
in the recipient genomes. 627
628
629
FUNDING INFORMATION 630
This work was supported by the National Institutes of Health grant R01 AI106007. 631
632
633
634
635
636
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
29
REFERENCES 637
1. Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and the 638
nature of bacterial innovation. Nature 405:299-304. 639
2. Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer in 640
prokaryotes: quantification and classification. Annu Rev Microbiol 55:709-742. 641
3. Dutta C, Pan A. 2002. Horizontal gene transfer and bacterial diversity. J Biosci 642
27:27-33. 643
4. Wiedenbeck J, Cohan FM. 2011. Origins of bacterial diversity through horizontal 644
genetic transfer and adaptation to new ecological niches. FEMS Microbiol Rev 645
35:957-976. 646
5. Mira A, Martin-Cuadrado AB, D'Auria G, Rodriguez-Valera F. 2010. The 647
bacterial pan-genome:a new paradigm in microbiology. Int Microbiol 13:45-57. 648
6. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, 649
Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen 650
TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, 651
Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz 652
MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou 653
L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O'Connor KJ, Smith 654
S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper 655
DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. 2005. Genome analysis 656
of multiple pathogenic isolates of Streptococcus agalactiae: implications for the 657
microbial "pan-genome". Proc Natl Acad Sci USA 102:13950-13955. 658
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
30
7. Soucy SM, Huang J, Gogarten JP. 2015. Horizontal gene transfer: building the 659
web of life. Nat Rev Genet 16:472-482. 660
8. Juhas M, van der Meer JR, Gaillard M, Harding RM, Hood DW, Crook DW. 661
2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. 662
FEMS Microbiol Rev 33:376-393. 663
9. Brussow H, Canchaya C, Hardt WD. 2004. Phages and the evolution of 664
bacterial pathogens: from genomic rearrangements to lysogenic conversion. 665
Microbiol Mol Biol Rev 68:560-602. 666
10. Rodriguez-Valera F, Martin-Cuadrado AB, Rodriguez-Brito B, Pasic L, 667
Thingstad TF, Rohwer F, Mira A. 2009. Explaining microbial population 668
genomics through phage predation. Nat Rev Microbiol 7:828-836. 669
11. Centers for Disease Control and Prevention (CDC). 2013. National 670
Salmonella Surveillance Annual Report, 2011. US Department of Health and 671
Human Services, CDC, Atlanta, GA, USA. 672
12. McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, 673
Porwollik S, Ali J, Dante M, Du F, Hou S, Layman D, Leonard S, Nguyen C, 674
Scott K, Holmes A, Grewal N, Mulvaney E, Ryan E, Sun H, Florea L, Miller 675
W, Stoneking T, Nhan M, Waterston R, Wilson RK. 2001. Complete genome 676
sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:852-856. 677
13. Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, Churcher 678
C, Quail MA, Stevens M, Jones MA, Watson M, Barron A, Layton A, Pickard 679
D, Kingsley RA, Bignell A, Clark L, Harris B, Ormond D, Abdellah Z, Brooks 680
K, Cherevach I, Chillingworth T, Woodward J, Norberczak H, Lord A, 681
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
31
Arrowsmith C, Jagels K, Moule S, Mungall K, Sanders M, Whitehead S, 682
Chabalgoity JA, Maskell D, Humphrey T, Roberts M, Barrow PA, Dougan G, 683
Parkhill J. 2008. Comparative genome analysis of Salmonella Enteritidis PT4 684
and Salmonella Gallinarum 287/91 provides insights into evolutionary and host 685
adaptation pathways. Genome Res 18:1624-1637. 686
14. Jacobsen A, Hendriksen RS, Aaresturp FM, Ussery DW, Friis C. 2011. The 687
Salmonella enterica pan-genome. Microb Ecol 62:487-504. 688
15. Kingsley RA, Msefula CL, Thomson NR, Kariuki S, Holt KE, Gordon MA, 689
Harris D, Clarke L, Whitehead S, Sangal V, Marsh K, Achtman M, Molyneux 690
ME, Cormican M, Parkhill J, MacLennan CA, Heyderman RS, Dougan G. 691
2009. Epidemic multiple drug resistant Salmonella Typhimurium causing invasive 692
disease in sub-Saharan Africa have a distinct genotype. Genome Res 19:2279-693
2287. 694
16. Zhang S, Kingsley RA, Santos RL, Andrews-Polymenis H, Raffatellu M, 695
Figueiredo J, Nunes J, Tsolis RM, Adams LG, Baumler AJ. 2003. Molecular 696
pathogenesis of Salmonella enterica serotype typhimurium-induced diarrhea. 697
Infect Immun 71:1-12. 698
17. Izumiya H, Sekizuka T, Nakaya H, Taguchi M, Oguchi A, Ichikawa N, Nishiko 699
R, Yamazaki S, Fujita N, Watanabe H, Ohnishi M, Kuroda M. 2011. Whole-700
genome analysis of Salmonella enterica serovar Typhimurium T000240 reveals 701
the acquisition of a genomic island involved in multidrug resistance via IS1 702
derivatives on the chromosome. Antimicrob Agents Chemother 55:623-630. 703
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
32
18. Paul S, Bhardwaj A, Bag SK, Sokurenko EV, Chattopadhyay S. 2015. 704
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial 705
genomes. Genomics 106:367-372. 706
19 Tettelin H, Riley D, Cattuto C, Medini D. 2008. Comparative genomics: the 707
bacterial pan-genome. Curr Opin Microbiol 11:472-477. 708
20. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: a fast 709
phage search tool. Nucleic Acids Res 39:W347-352. 710
21. Huang da W, Sherman BT, Lempicki RA. 2009. Systematic and integrative 711
analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 712
4:44-57. 713
22. Broadway KM, Modise T, Jensen RV, Scharf BE. 2014. Complete genome 714
sequence of Salmonella enterica serovar Typhimurium VNP20009, a strain 715
engineered for tumor targeting. J Biotechnol 192:177-178. 716
23. Boyd D, Peters GA, Cloeckaert A, Boumedine KS, Chaslus-Dancla E, 717
Imberechts H, Mulvey MR. 2001. Complete nucleotide sequence of a 43-718
kilobase genomic island associated with the multidrug resistance region of 719
Salmonella enterica serovar Typhimurium DT104 and its identification in phage 720
type DT120 and serovar Agona. J Bacteriol 183:5725-5732. 721
24. Lee K, Kusumoto M, Sekizuka T, Kuroda M, Uchida I, Iwata T, Okamoto S, 722
Yabe K, Inaoka T, Akiba M. 2015. Extensive amplification of GI-VII-6, a 723
multidrug resistance genomic island of Salmonella enterica serovar 724
Typhimurium, increases resistance to extended-spectrum cephalosporins. Front 725
Microbiol 6:78. 726
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
33
25. Hoffmann M, Muruvanda T, Allard MW, Korlach J, Roberts RJ, Timme R, 727
Payne J, McDermott PF, Evans P, Meng J, Brown EW, Zhao S. 2013. 728
Complete Genome Sequence of a Multidrug-Resistant Salmonella enterica 729
Serovar Typhimurium var. 5- Strain Isolated from Chicken Breast. Genome 730
Announc 1:e01068-13. 731
26. Klumpp J, Fuchs TM. 2007. Identification of novel genes in genomic islands that 732
contribute to Salmonella Typhimurium replication in macrophages. Microbiology 733
153:1207-1220. 734
27. Tomljenovic-Berube AM, Henriksbo B, Porwollik S, Cooper CA, Tuinema 735
BR, McClelland M, Coombes BK. 2013. Mapping and regulation of genes within 736
Salmonella pathogenicity island 12 that contribute to in vivo fitness of Salmonella 737
enterica Serovar Typhimurium. Infect Immun 81:2394-2404. 738
28. Rabsch W, Andrews HL, Kingsley RA, Prager R, Tschape H, Adams LG, 739
Baumler AJ. 2002. Salmonella enterica serotype Typhimurium and its host-740
adapted variants. Infect Immun 70:2249-2255. 741
29. Calva E, Silva C, Zaidi MB, Sanchez-Flores A, Estrada K, Silva GG, Soto-742
Jimenez LM, Wiesner M, Fernandez-Mora M, Edwards RA, Vinuesa P. 2015. 743
Complete genome sequencing of a multidrug-resistant and human-invasive 744
Salmonella enterica serovar Typhimurium strain of the emerging sequence type 745
213 genotype. Genome Announc 3:e00663-15. 746
30. Silva C, Calva E, Calva JJ, Wiesner M, Fernandez-Mora M, Puente JL, 747
Vinuesa P. 2015. Complete genome sequence of a human-invasive Salmonella 748
enterica serovar Typhimurium strain of the emerging sequence type 213 749
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
34
harboring a multidrug resistance IncA/C plasmid and a blaCMY-2-Carrying IncF 750
plasmid. Genome Announc 3:e01323-15. 751
31. Shahada F, Sekizuka T, Kuroda M, Kusumoto M, Ohishi D, Matsumoto A, 752
Okazaki H, Tanaka K, Uchida I, Izumiya H, Watanabe H, Tamamura Y, Iwata 753
T, Akiba M. 2011. Characterization of Salmonella enterica serovar Typhimurium 754
isolates harboring a chromosomally encoded CMY-2 beta-lactamase gene 755
located on a multidrug resistance genomic island. Antimicrob Agents Chemother 756
55:4114-4121. 757
32. Gordienko EN, Kazanov MD, Gelfand MS. 2013. Evolution of pan-genomes of 758
Escherichia coli, Shigella spp., and Salmonella enterica. J Bacteriol 195:2786-759
2792. 760
33. Vernikos, G, Medini D, Riley DR, Tettelin H. 2015. Ten years of pan-genome 761
analyses. Curr Opin Microbiol 23:148-154. 762
34. Xiao J, Zhang Z, Wu J, Yu J. 2015. A brief review of software tools for 763
pangenomics. Genomics Proteomics Bioinformatics 13:73-76. 764
765
35. Johnson TJ, Wannemuehler Y, Kariyawasam S, Johnson JR, Logue CM, 766
Nolan LK. 2012. Prevalence of avian-pathogenic Escherichia coli strain O1 767
genomic islands among extraintestinal and commensal E. coli isolates. J 768
Bacteriol 194:2846-2853. 769
36. Paul S, Linardopoulou EV, Billig M, Tchesnokova V, Price LB, Johnson JR, 770
Chattopadhyay S, Sokurenko EV. 2013. Role of homologous recombination in 771
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
35
adaptive diversification of extraintestinal Escherichia coli. J Bacteriol 195:231-772
242. 773
37. Hancock V, Vejborg RM, Klemm P. 2010. Functional genomics of probiotic 774
Escherichia coli Nissle 1917 and 83972, and UPEC strain CFT073: comparison 775
of transcriptomes, growth and biofilm formation. Mol Genet Genomics 284:437-776
454. 777
38. Vejborg RM, Friis C, Hancock V, Schembri MA, Klemm P. 2010. A virulent 778
parent with probiotic progeny: comparative genomics of Escherichia coli strains 779
CFT073, Nissle 1917 and ABU 83972. Mol Genet Genomics 283:469-484. 780
39. Jarvik T, Smillie C, Groisman EA, Ochman H. 2009. Short-term signatures of 781
evolutionary change in the Salmonella enterica serovar typhimurium 14028 782
genome. J Bacteriol 192:560-567. 783
40. Luo Y, Kong Q, Yang J, Mitra A, Golden G, Wanda SY, Roland KL, Jensen 784
RV, Ernst PB, Curtiss R, 3rd. 2012. Comparative genome analysis of the high 785
pathogenicity Salmonella Typhimurium strain UK-1. PLoS One 7:e40645. 786
41. Wu KH, Wang KC, Lee LW, Huang YN, Yeh KS. 2012. A constitutively 787
mannose-sensitive agglutinating Salmonella enterica subsp. enterica serovar 788
Typhimurium strain, carrying a transposon in the fimbrial usher gene stbC, 789
exhibits multidrug resistance and flagellated phenotypes. Scientific World Journal 790
2012:280264. 791
792
793
794
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
36
FIGURE LEGENDS 795
Figure 1. Comparison of the number of genes present in 8 Typhimurium strains. The 796
left panel represents the number of protein-coding genes present in existing annotation, 797
and the right panel showing the number of genes after re-annotation. 798
799
Figure 2. Schematic representation of the pan-genomic profile for 8 Typhimurium 800
strains. The numbers of core-, mosaic- and strain-specific genes are shown based on 801
(A) existing annotation and (B) re-annotation, using 95% nucleotide sequence identity 802
and gene-length coverage thresholds for orthologous gene identification. 803
804
Figure 3. Pan-genome size distribution with increasing number of genomes in a set of 8 805
Typhimurium strains. The pan-genome size is depicted by the number of protein-coding 806
genes based on (A) existing annotation, (B) re-annotation, and (C) re-annotation 807
excluding the prophage regions. The power law fit (n = k Nγ) was performed using 808
median values (black dots). 809
810
Figure. 4. Schematic representation of the pan-genomic profile for different genomic 811
fractions of Typhimurium strains after re-annotation. The numbers of core-, mosaic- and 812
strain-specific genes are shown for (A) only the prophage regions, and (B) genomes 813
excluding the prophage regions, using 95% nucleotide sequence identity and gene-814
length coverage thresholds for orthologous gene identification. 815
816
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from
37
Figure 5. Map of accessory non-phage and ‘intact’ prophage clusters in 8 Typhimurium 817
genomes. (A) The ‘intact’ prophage regions (as identified by PHAST), and (B) the 818
deleted/inserted regions are marked in the multiple alignment. While red bars show the 819
intact prophage regions present in the genomes, purple and green bars designate the 820
regions deleted from and inserted to the genomes, respectively. 821
822
on April 3, 2020 by guest
http://jb.asm.org/
Dow
nloaded from