42
1 Corrected Genome Annotations Reveal Gene Loss and Antibiotic Resistance as 1 Drivers in the Fitness Evolution of Salmonella Typhimurium. 2 3 Sandip Paul*, Evgeni V. Sokurenko, Sujay Chattopadhyay # 4 5 Department of Microbiology, University of Washington, Seattle, Washington, USA 6 7 Running Head: Fitness evolution of Salmonella Typhimurium genomes 8 9 # Address correspondence to Sujay Chattopadhyay, [email protected]. 10 11 *Present address: Structural Biology and Bioinformatics Division, CSIR-Indian Institute 12 of Chemical Biology, Kolkata 700032, India 13 14 15 16 17 18 19 20 21 22 JB Accepted Manuscript Posted Online 12 September 2016 J. Bacteriol. doi:10.1128/JB.00545-16 Copyright © 2016, American Society for Microbiology. All Rights Reserved. on April 3, 2020 by guest http://jb.asm.org/ Downloaded from

Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

1

Corrected Genome Annotations Reveal Gene Loss and Antibiotic Resistance as 1

Drivers in the Fitness Evolution of Salmonella Typhimurium. 2

3

Sandip Paul*, Evgeni V. Sokurenko, Sujay Chattopadhyay# 4

5

Department of Microbiology, University of Washington, Seattle, Washington, USA 6

7

Running Head: Fitness evolution of Salmonella Typhimurium genomes 8

9

#Address correspondence to Sujay Chattopadhyay, [email protected]. 10

11

*Present address: Structural Biology and Bioinformatics Division, CSIR-Indian Institute 12

of Chemical Biology, Kolkata 700032, India 13

14

15

16

17

18

19

20

21

22

JB Accepted Manuscript Posted Online 12 September 2016J. Bacteriol. doi:10.1128/JB.00545-16Copyright © 2016, American Society for Microbiology. All Rights Reserved.

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

2

ABSTRACT 23

Horizontal acquisition of novel chromosomal genes is considered to be a key process in 24

the evolution of bacterial pathogens. However, identification of gene presence or 25

absence could be hindered by the inconsistencies in bacterial genome annotations. 26

Here, we perform a cross-annotation of omnipresent core and mosaic accessory genes 27

in the chromosome of Salmonella enterica serovar Typhimurium, across a total of 20 28

fully-assembled genomes deposited into GenBank. Cross-annotation resulted in 32% 29

increase in the number of core genes and 3 fold drop in genes identified as mosaic (i.e. 30

present in some strains only) by the original annotation. Of the remaining non-core 31

genes, vast majority were of prophage nature and 255 of non-phage genes were 32

actually of core origin but lost in some strains upon the emergence of S. Typhimurium 33

serovar, suggesting that the chromosomal portion of S. Typhimurium genome acquired 34

a very limited number of novel genes other than prophages. Only horizontally-acquired 35

non-phage genes related to bacterial fitness or virulence were found in four recently 36

sequenced isolates, all located on three different genomic islands that harbor multi-drug 37

resistance determinants. Thus, extensive use of antimicrobials could be the main 38

selection force behind the new fitness gene acquisition and emergence of novel 39

Salmonella pathotypes. 40

41

42

43

44

45

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

3

IMPORTANCE 46

Significant discrepancies in the annotations of bacterial genomes could mislead the 47

conclusions about evolutionary origin of chromosomal genes, as we demonstrate here 48

via cross-annotation based analysis of Salmonella Typhimurium genomes from 49

GenBank. We conclude that despite being able to infect a broad range of vertebrate 50

hosts, the genomic diversity of S. Typhimurium strains is almost exclusively limited to 51

gene loss and transfer of prophage DNA. Only non-phage chromosomal genes acquired 52

after the emergence of the serovar are linked to the genomic islands harboring multi-53

drug resistance factors. Since the fitness factors could lead to increased virulence, this 54

poses an important research question: Could overuse or misuse of antimicrobials act as 55

selection forces for the emergence of more pathogenic strains of Salmonella? 56

57

58

59

60

61

62

63

64

65

66

67

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

4

INTRODUCTION 68

Horizontal gene transfer in bacteria is a major force in the adaptation to novel 69

environments, genome diversification and, in particular, in the evolution of bacterial 70

virulence (1-4). Horizontally transferred genes create a mosaic structure of the species 71

pan-genome, where the so-called accessory genes are present only in some strains, in 72

contrast to the core genes that are typically present in all strains (5-7). The 73

chromosomal accessory genes could be subdivided into the genes of phage origin and 74

the non-phage genes. The latter usually are acquired in form of genomic islands, 75

episomes or transposons that are often related to bacterial physiology, antigenic 76

diversification and, thus, fitness (4, 7, 8). In contrast, the phage genes are directly 77

related to phage physiology, although many prophages are also known to contribute 78

virulence and fitness factors to bacteria (9, 10). Distinguishing the accessory genes 79

from the core genes and defining their nature is of critical importance for understanding 80

the evolutionary dynamics and adaptive mechanisms of bacterial species. 81

Salmonella enterica subspecies enterica represents one of the most important 82

and widely distributed bacterial pathogens to both humans and domesticated animals 83

(11-14). Salmonella serovar Typhimurium represents a broad-host spectrum and one of 84

the most commonly isolated serovars from human, retail meats of diverse origins as 85

well as environment. While Typhimurium is primarily known to cause self-limiting 86

gastroenteritis, it also could be systemically invasive and exhibit multidrug resistance 87

(MDR) phenotype (15-17). Because Typhimurium serves as a complex serovar 88

involving multiple eco-/patho-vars, horizontal acquisition of genetic material might play 89

pivotal role in the emergence and evolution of different Typhimurium strains. However, 90

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

5

systematic analysis of the extent and nature of accessory genes in the Typhimurium 91

pan-genome has not yet been performed despite the availability of many fully-92

assembled high-quality genomes representing this serovar. 93

A major hindrance for the gene content analysis is inconsistency in the existing 94

annotations of even fully assembled, high-quality sequenced genomes, with genes that 95

are annotated in some strains but un-annotated or mis-annotated in other strains. Such 96

inconsistent and incomplete gene annotations can severely limit genome-wide 97

comparison and profiling of core and accessory fractions based on annotated genes, 98

proteins or functional clusters. Therefore, a lack of completeness and uniformity in 99

annotations considerably restrains the usefulness of availability of genome sequences 100

in understanding the mechanisms of bacterial evolution, especially those related to 101

pathogenesis. 102

Here, we compare gene content across chromosomal portions of the genomes of 103

eight Typhimurium strains that are fully-assembled and currently available in GenBank. 104

We detect that there are significant discrepancies in the annotation of, on average, 105

hundreds of genes per strain leading to highly misleading conclusions about the extent 106

of the Typhimurium genome openness to the gene transfer. Based on cross-annotation 107

of the existing annotated genomes, we here determine that the diversification of 108

genomes in the Typhimurium strains is almost exclusively driven by phages, and via 109

core gene loss. It appears that, since the emergence of the serovar, with exception of 110

one strain, none of the Typhimurium strains has acquired any novel genes that are not 111

directly related to the phage biology. This study indicates very little role of the 112

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

6

acquisition of non-phage genes in the fitness diversification of Typhimurium strains, and 113

highlights a critical need for the cross-annotation of existing bacterial genomes. 114

115

116

MATERIALS AND METHODS 117

Bacterial genomes and phylogeny. 118

Eight fully assembled genomes of Salmonella enterica Typhimurium (strains D23580, 119

798, ST4/74, T000240, UK-1, SL1344, LT2 and 14028S) were downloaded from NCBI 120

GenBank. For comparison, complete sequences of fully assembled genomes from 121

twelve other Typhimurium strains and fifteen non-Typhimurium strains were selected 122

and downloaded. Among the non-Typhimurium strains, except for two strains each from 123

Paratyphi A and Typhi serovars, the strains were clonally distinct according to MLST 124

analysis (http://mlst.ucc.ie) (see Fig. S1 in the supplemental material). 125

126

Pan-genomic analysis and re-annotation of genomes. 127

We used our recently developed tool PanCoreGen (18) that, apart from several other 128

features, generates pan-genomic profiles of chromosomal genes via re-annotations of 129

each annotated genome using rest of the annotated genomes as references. Based on 130

user-defined threshold values of nucleotide sequence identity and length-coverage, this 131

tool distinguishes each gene in the analyzed set of genomes as ‘core’ (i.e. present in all 132

genomes of the analyzed dataset), ‘mosaic’ (i.e. present in multiple but not all 133

genomes), or strain-specific (i.e. present only in one of the annotated genomes). We 134

applied PanCoreGen tool on 8 annotated chromosomes of S. enterica Typhimurium to 135

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

7

create the pan-genomic profile of serovar Typhimurium. For BLAST (blastn) search of 136

orthologs we used 95% nucleotide sequence identity and gene-length coverage as the 137

lower limit. All the analyses were restricted to the chromosomal genes, not considering 138

the plasmids. We found the pan-genome size of 5936 genes of which 5399 genes were 139

core. The gene distribution for each genome resulting from the pan-genomic profile was 140

used for re-annotation. We re-annotated each genome based on the rigorous following 141

steps: 142

a) Each gene found by PanCoreGen for a genome was checked whether it was already 143

annotated or not in the existing gene annotations for that genome. We used the BLAST 144

with 100% sequence identity and at least 50% length coverage to be considered as a 145

newly annotated gene. A newly annotated gene might be either completely unannotated 146

previously or with a partial annotation where the gene-length was less than half of the 147

length in new annotation. 148

b) All newly annotated genes were included only if no premature stop codons were 149

present therein. Otherwise, the genes were discarded to avoid inclusion of 150

pseudogenes. 151

c) We checked all the newly annotated genes by BLASTing (blastn) them against all 152

annotated pseudogenes in 8 Typhimurium strains with constraints of 95% sequence 153

identity and 20% length coverage. This would avoid inclusion of smaller fractions of any 154

pseudogene annotated as open reading frame (ORF) in some genomes, along with the 155

pseudogenes that did not accumulate any premature stop codons (e.g., disabled genes, 156

or unitary pseudogenes). 157

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

8

d) All the partially annotated genes were detected with further orthology search (100% 158

sequence identity and 10% length coverage) among previously annotated genes. 159

160

Pan-genomic profiling. 161

We created the pan-genomic profile of 8 Typhimurium strains via serial inclusion of n 162

genomes (where n goes from 1 through 7) using 8 random combinations for n=2,3,…7. 163

This profile was generated for three sets, genomes with existing GenBank annotations, 164

genomes after re-annotation and re-annotated genomes without prophage regions. 165

Using Prism software, we performed least squares curve-fitting based on power law n = 166

κ Nγ to medians. The exponent γ≈0 indicates a closed nature of the pan-genome (19). 167

168

Phage region identification. 169

In each of 8 Typhimurium strains, we identified prophage sequences with PHAST 170

(phage search tool, available at http://phast.wishartlab.com/ (20)) Web Server by 171

uploading the re-annotated GenBank formatted files. We considered all the regions 172

identified as “intact”, “incomplete” or “questionable” by PHAST to be the probable 173

prophage regions from all the strains under study. We also considered the phage genes 174

annotated in each of the existing GenBank files. The orthologous sequences of these 175

phage genes were extracted across all the Typhimurium strains under study. 176

177

178

179

180

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

9

Functional enrichment analysis. 181

We used DAVID software (21) for clustering based on protein functions of DT12 182

genomic island genes of the strain T000240. The classification stringency was set to 183

‘medium’ for the analysis. 184

185

186

RESULTS 187

Large fraction of un-annotated genes in genomes of Typhimurium strains. 188

We analyzed gene presence/absence content in the chromosomal portion of fully-189

assembled genomes of eight archetypal Typhimurium strains – 14028S, 798, D23580, 190

LT2, SL1344, ST474, T000240 and UK-1. Based on seven housekeeping loci that are 191

used for multi-locus sequence typing, these stains formed a single tight clade within 192

Salmonella enterica subspecies I (see Fig. S1 in the supplemental material). The 193

genome size variability of the Typhimurium strains ranged from 4.82 Mb (UK-1) to 4.95 194

Mb (T000240), with only 0.81±0.14% average pairwise difference between the strains. 195

In contrast, based on the GenBank annotation of protein-coding genes, the number of 196

genes in the Typhimurium genomes varied between 4,326 (strain 798) and 5,323 genes 197

(14028S), with 7.14±1.36% average pairwise difference between the strains (Fig. 1, 198

grey bars). The number of annotated genes in individual genomes did not correlate 199

(R2=0.03) with the size of genomes (see Fig. S2 in the supplemental material). 200

We re-annotated protein-coding genes in each Typhimurium strain by cross-201

annotating their genomes using recently developed PanCoreGen software (18). After 202

the cross-annotation, the pairwise difference in the gene content between the strains 203

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 10: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

10

went down 5 fold, to 1.43±0.26% (P<0.0001). Average gene content per genome 204

increased from 4,600±112 in GenBank annotated genomes to 5,430±26 genes after the 205

cross-annotation, which is higher than the number of originally annotated genes in the 206

genome of strain 14028S with the highest number of genes according to the GenBank 207

(Fig. 1, black bars). The median length of ORFs missed by the original annotations was 208

relatively small and ranged from 132 to 147 bp (see Table S1 and Table S2 in the 209

supplemental material). However, each re-annotated genome had on average 34 newly 210

annotated genes that were ≥ 300 bp long. The longest such gene was ftsK (4,086 bp) 211

encoding DNA translocase that was missed by original annotation in strain UK-1. 212

Importantly, after the cross-annotation, the number of genes per genome was well 213

correlated (R2=0.86) with the overall size of corresponding genomes (see Fig. S2 in the 214

supplemental material). 215

Thus, there were substantial discrepancies in the GenBank annotations of fully 216

assembled genomes of eight archetypal Typhimurium strains. While the discrepancies 217

mostly involved small size ORFs, this resulted in underestimation of the gene number in 218

every genome as well as overestimation of the differences in gene content between the 219

strains. 220

221

Significant reduction in the number of non-core genes after cross-annotation. 222

We next assessed the number of the ‘core genes’ present in all eight strains, ‘mosaic 223

genes’ present in multiple but not all strains, and ‘strain-specific genes’ present in only 224

one of the strains. 225

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 11: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

11

Based on the GenBank annotations, there were 4,056 core, 753 mosaic and 226

1,185 strain-specific genes (Fig. 2A), with the chicken-derived strain 14028S showing 227

the highest number of strain-specific genes (718 genes). Thus, the core genes 228

represented only 67.7% of all chromosomal genes (pan-genome) and, on average, 229

comprised 88.5% of individual genomes. Furthermore, according to the original 230

annotations, the Typhimurium serovar appeared to have an ‘open genome’, where the 231

pan-genome size was notably increasing with the increase in number of genomes 232

analyzed (Fig. 3A). The curve-fitting yielded the exponent γ value of 0.16±0.02 that was 233

significantly above zero (γ=0 is indicative of a completely ‘closed genome’; see Methods 234

for details). 235

After the cross-annotation by PanCoreGen, the number of the Typhimurium core 236

genes increased 1.3 fold – to 5,348 genes, while the number of mosaic and strain-237

specific genes dropped about 3 fold – to 292 and 343 genes, respectively (Fig. 2B). The 238

relative distribution of strain-specific genes also changed significantly. Only 25 genes 239

were unique in the strain 14028S, while the highest number of strain-specific genes was 240

in the multi-drug resistant strain T000240 (190 genes) followed by the systemically 241

invasive strain D23580 (77 genes). On the other end, the calf salmonellosis strain 242

ST474 and the model gastroenteritis strain SL1344 (which is ST474 auxotroph 243

derivative), did not have any genes uniquely present in them. In contrast to the original 244

annotation, 98.5% of genomes content, on average, were of core origin as we 245

performed cross-annotation. Overall, the core genes comprised a significantly larger 246

portion of the pan-genome (89.4%) than before the cross-annotation. As a result, the 247

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 12: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

12

‘openness’ of the Typhimurium serovar’s pan-genome became significantly less obvious 248

(Fig. 3B), with the γ value decreasing more than three-fold (γ=0.05±0.01, P<0.0001). 249

250

Thus, cross-annotation of the Typhimurium strains’ genomes significantly 251

increased the proportion of core genes in both the serovar’s pan-genome and individual 252

genomes. While after the cross-annotation the openness of Typhimurium genome 253

became much less obvious, it is not completely closed, indicating certain level of 254

horizontal gene movement within the serovar. The remaining of the study was done on 255

the re-annotated Typhimurium genomes. 256

257

Non-core predominance and active transfer of prophages in the serovar. 258

We determined the genes of prophage origin based on existing annotations of phage 259

genes and also by using the PHAST program (20) to predict the prophage clusters. 260

Among the genes that were newly annotated by PanCoreGen at least in one genome, 261

about 10% were of prophage origin (not shown). Out of 922 prophage genes identified 262

in the cross-annotated pan-genome, 456 were of core nature, 238 mosaic and 228 263

unique (Fig. 4A). Thus, the prophage genes comprised only 8.5% of the core genes, but 264

81.5% of mosaic and 66.5% of strain-specific genes. Also, without the prophage genes, 265

the core genes comprised 96.5% of the pan-genome and 99% of individual genomes on 266

average. After exclusion of the prophage genes from the analysis of the genome 267

openness, the γ value dropped further more than 2 fold (γ=0.011±0.003, P<0.0001 in 268

Fig. 3C), being only marginally above zero. 269

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 13: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

13

Ninety-eight percent of the prophage genes were located on the chromosome in 270

19 clusters, each incorporating at least 2 genes, with a median size of 28 genes (see 271

Table S3 in the supplemental material). Ten of the clusters were designated by PHAST 272

as intact prophages and contained 68% of all phage genes. Another 6 clusters, with 157 273

genes, were designated as incomplete prophage regions. The remaining 3 clusters with 274

118 genes were designated by PHAST as questionable. 275

In order to understand the evolutionary origin and dynamics of phage genes we 276

have analyzed in detail their nature, strain distribution and, in particular, presence in the 277

S. Heidelberg strain SL476. This Heidelberg strain was chosen as the closest relative 278

based on evolutionary distance of the MLST sequences across serovars (see Fig. S1 in 279

the supplemental material). 280

Out of 10 intact prophages, only φGifsy1 and φGifsy2 were found in all 281

Typhimurium strains (Fig. 5A), suggesting their ancestral nature in the serovar. 282

Interestingly, the intact core prophages were either absent (φGifsy1) or only partially 283

present (φGifsy2) in the Heidelberg strain. In contrast, 5 intact prophages were found 284

only in one strain – φGifsy3 (strain 14028S), φFELS1 (LT2), φST104 (T000240), φBTP1 285

and φBTP5 (D23580). Detailed examination of the chromosomal regions corresponding 286

to the insertion sites of the strain-specific prophages showed no remnant scars in the 287

Typhimurium strains that lack these phages or in S. Heidelberg, strongly suggesting that 288

they were acquired after the serovar had emerged. 289

Three remaining intact prophages – φST64B, φFELS2 and φFELS2-like – had the 290

mosaic distribution, i.e. were found in multiple but not all strains. None of these phages 291

or their remnants was present in the Heidelberg strain. φST64B was present in all but 292

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 14: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

14

LT2 and T000240 strains. However, a close examination detected remnant sequences 293

of φST64B in the corresponding position in the latter strains, suggesting that φST64B 294

was originally present in LT2 and T000240 too, but was lost. Another intact mosaic 295

prophage, φFELS-2, was present only in LT2 and T000240. However again, in 14028S, 296

UK-1 and D23580 the remnant scars were present, suggesting φFELS-2 loss from the 297

strains. Interestingly, in strains SL1344, ST474 and 798 there was another prophage in 298

the position of φFELS-2 prophage. This phage had 44% genes homologous to φFELS-2 299

(based on 95% sequence identity and length coverage between SL1344 and LT2) and, 300

thus, was designated previously as φFELS-2–like prophage. Therefore, both φST64B 301

and φFELS-2 phages appeared to be originally core phages but underwent partial loss 302

or replacement, while φFELS2-like was likely a result of insertion event later during the 303

evolution of the serovar. 304

Analysis of phage clusters designated as incomplete indicated that 4 of them 305

were clearly of core nature (not shown). Interestingly, this determination could be done 306

only after cross-annotation of the genomes as the GenBank annotation missed them in 307

multiple strains, again causing some misleading conclusions. Other 18 phage genes 308

were not clustered and all of them were detected as core being present in the same 309

locations across the genomes. 310

Thus, prophage genes comprise the largest portion of non-core genes and, 311

without them, the Typhimurium pan-genome appears almost completely closed. 312

Altogether, the comparative map of clusters of phage origins suggests a continuous 313

acquisition and loss of phage materials across the Typhimurium strains. 314

315

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 15: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

15

Evolutionary origin of the non-phage accessory genes. 316

Upon removal of the prophage genes and genes from the repeat regions, 51 genes 317

remained in the mosaic category and 115 genes in strain-specific category, with all but 318

1 of the latter found in strain T000240 (Fig. 4B). To analyze the origin of non-phage 319

genes we analyzed their nature in detail, again using S. Heidelberg strain SL476 as the 320

closest relative of the Typhimurium clade in the MLST phylogeny (see Fig. S1 in the 321

supplemental material). 322

Among the 51 mosaic genes of non-phage origin, 34 genes were located in 7 323

clusters of 2 or more genes (see Table S4 in the supplemental material). All of these 324

clusters, however, were designated as mosaic by being absent only in 1 or 2 strains 325

(Fig. 5B) – del_1, del_2 and del_3 were absent only in T000240; del_4, del_5 and del_6 326

were absent only in D23580; and del_7 was absent in UK-1 and 14028S. Moreover, in 327

the Heidelberg strain all of the mosaic gene clusters were present in full and at 99% 328

nucleotide sequence identity, except del_7, of which only 59% of the cluster was 329

identical in Heidelberg strain at the identity level of 98% or above. Close examination of 330

the remaining 17 mosaic genes that were not in clusters have shown that they were 331

actually present in all strains (i.e., were of core nature), but in certain strains their copies 332

were either truncated more than 5% due to the either insertional disruption or partial 333

deletion (see Table S4 in the supplemental material). This led PanCoreGen to miss their 334

actual presence in some strains during the re-annotation using 95% cut-offs for both 335

nucleotide sequence identity and gene-length coverage. Also, 16 of these genes were 336

present in full length in the Heidelberg strain as well, suggesting their presence to be 337

ancestral to Typhimurium. Thus, it appears that non-phage genes that were designated 338

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 16: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

16

as mosaic in nature were so due to either their loss in clusters or partial truncation in 339

some Typhimurium strains rather than a de novo acquisition. 340

As mentioned above, only two strains were found to have strain-specific genes of 341

non-phage origin – 114 such genes in strain T000240 and only 1 gene in strain 798. 342

The strain 798-specific gene was 2097 bp long rnfC encoding an electron transport 343

complex protein. However, further analysis detected that rnfC was essentially a core 344

gene present in all Typhimurium strains, with the rest of strains carrying a longer 2208 345

bp gene version. Due to the length difference, PanCoreGen mistakenly designated the 346

shorter version as specific to strain 798 and the longer one as mosaic in the other 347

strains. Interestingly, however, the Heidelberg strain had the longer gene, while the 348

shorter version was found in several other serovars such as Enteritidis, Gallinarum, 349

Pullorum (not shown). 350

Detailed examination of the 114 strain-specific genes in T000240 showed that 351

100 genes were clustered in a single 82kb chromosomal island GI-DT12 identified 352

previously (17). The insertion site of GI-DT12 was in the middle of a putative regulatory 353

gene STM14_4564 (as annotated in strain 14028S). This gene remained intact, without 354

any ‘scars’, in the rest of Typhimurium strains as well as in the Heidelberg strain SL476. 355

This strongly suggested that the island was acquired by strain T000240 rather than lost 356

by others. 357

Of the remaining 14 T000240-specific genes, 12 genes were actually 358

represented by 6 identical copies of two overlapping genes (504 and 276 bp long) 359

encoding IS1 transposases that propagated across the genome, disrupting different 360

gene regions and, thus, resulting in some genes being mosaic in nature. Two other 361

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 17: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

17

T000240-specific genes, 309 bp long STMDT12_C29860 and 387 bp long 362

STMDT12_C26860 encoded a hypothetical protein. The former one replaced 117 bp 363

long hypothetical gene present in the rest of the strains except LT2, while the latter was 364

detected immediately upstream of T000240-specific ‘questionable’ phage cluster Q3 365

(see Table S3 in the supplemental material). Thus, together with the island genes, 14 366

other T000240-specific genes also appear to be acquired horizontally by the T000240 367

strain. 368

Altogether, it is evident that only one Typhimurium strain analyzed - T000240 – 369

had undergone genomic acquisition events that did not involve prophages, primarily the 370

82 kb genomic island. The rest of the non-phage genes that were originally defined here 371

as mosaic or unique in the Typhimurium strains were actually core genes either 372

completely or partially lost in some strains after the serovar had emerged. 373

374

Identification of complete genomic core of the Typhimurium serovar. 375

Based on the analysis of eight cross-annotated genomes presented above, we could 376

define that, upon exclusion of prophage genes, the ancestral (‘total’) chromosomal core 377

of Typhimurium serovar is comprised of 4,944 genes, including omnipresent 4,892 378

genes (‘stable core’) and 52 partially lost genes (‘unstable core’). 379

To assess the completeness of the genomic core identified, we added to the 380

analysis twelve more completely sequenced Typhimurium genomes that became 381

available in the GenBank in the course of this study (see Table S5 in the supplemental 382

material). Upon cross-annotation of all twenty genomes together, 4694 genes remained 383

omnipresent, while 198 genes of the original stable core were found to be completely or 384

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 18: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

18

partially lost from some of 12 genomes. 84% of all those genes were in a total of 8 385

clusters (of 2 or more consecutive genes), with the largest deletion event happened in 386

strain VNP20009, a modified derivative of 14028S (220), involving 102 genes across 387

108 kb region. In this new set of genomes, we also found the presence of 2 of 7 deletion 388

events noted previously for the set of 52 mosaic genes. The event del_7 appeared to 389

happen in the common ancestor of UK-1, 14028S and VNP20009. Interestingly, del_4 390

which was earlier detected in MDR and invasive strain D23580 was also present in two 391

MDR strains DT104 and 138736. 392

Upon the cross-annotation of twenty genomes of Typhimurium, 29 new genes 393

were added to the total core. All of them were missed in the original set of 8 genomes 394

because of their absence in the annotations rather than genes per se. Of these newly 395

annotated genes, 24 genes were omnipresent (stable core) and remaining 5 genes 396

showed deletion in some of the genomes. 397

Thus, the originally defined set of core genes proved to be highly representative 398

for the Typhimurium serovar. Upon the cross-annotation of twenty fully assembled 399

genomes deposited in GenBank, the non-phage total core of Typhimurium stands at 400

4973 genes, with the omnipresent stable core comprised of 4718 genes (95%) and 401

partially lost unstable core of 255 genes (5%), with the latter represented by 15 to 19 402

strains. 403

404

Insertion of genomic islands – limited, but all with MDR and fitness genes. 405

As described above, in the originally-analyzed set of eight genomes, horizontally-406

acquired set of genes was found only in one strain - the MDR strain T000240 with the 407

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 19: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

19

100-gene genomic island DT12. Functional enrichment analysis (see Table S6 in the 408

supplemental material) of DT12 island revealed genes coding for resistance to 409

chloramphenicol, sulfonamides, tetracycline, etc (i.e., bla(oxa-30), aadA1, qacEΔ1, and 410

sul1, cat, and tetA) as well as for fitness- and, potentially, virulence-associated genes 411

like aerobactin iron-acquisition siderophore system (lutA, lucABC) and iron transporter 412

(sitABCD). 413

Analysis of the twelve additional genomes of Typhimurium revealed 2 additional 414

genomic islands in 3 strains, both of which were reported previously (23, 24). One of 415

them was GI-VII-6, a 125 kb island, which was incorporated in a MDR strain L-3553 416

isolated from cattle in Japan, and was found to code for several antibiotic resistance 417

genes (aadA, strA, strB, sul1, sul2, tetA. floR, dfrA12) along with blaCMY-2 gene (for 418

extended spectrum cephalosporin resistance). In addition, this island genes coded a 419

number of transcriptional regulators (STL3553_RS05185, STL3553_RS05225, 420

STL3553_RS05270, STL3553_RS05280) as well as siderophore transporter proteins 421

(STL3553_RS05320) that likely affect the overall fitness and, potentially, virulence of 422

the strains. Identical copies of another genomic island, SGI-1 (43 kb), were found in two 423

closely related strains, DT104 and 138736. This island again coded genes responsible 424

for the MDR phenotype of both these strains – ampicillin, chloramphenicol, 425

streptomycin, sulfonamides, tetracycline, fluoroquinolones, etc. – that was attributed to 426

a 13 kb region of the island carrying MDR genes (e.g. aadA2, floR, tetG, pse-1, sul1, 427

etc.). Here also we found some potential fitness-related genes encoding transcriptional 428

regulators (e.g. tetR, orf1,) and recombination proteins (e.g. tnpA, tnpR, int1, orf2). 429

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 20: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

20

The only other set of non-core non-phage genes identified in the newly analyzed 430

genomes were 24.6 kb genomic insertion in position between the genes 431

CFSAN001921_RS03395 and CFSAN001921_RS03550 of another MDR strain 432

CFSAN001921 (25). In the insertion, 13 out of 27 genes were of hypothetical origin, 433

with none of the remaining genes being related to known resistance or fitness factors 434

but were primarily related to the genomic stability cassettes. Two of the genes were 435

annotated to encode phage functions and the insertion itself was located in the insertion 436

site of Fels-2 prophage of LT2. Thus, the phage origin of the 24.6 kb genomic insertion 437

could not be excluded. 438

Altogether, the cross-annotation analysis of twenty genomes of Typhimurium 439

strains revealed only 3 genomic regions (in 4 strains) clearly of non-phage origin being 440

horizontally transferred, with all of them carrying multiple antibiotic resistance as well as 441

fitness-related genes. 442

443

444

DISCUSSION 445

In this study we eliminated discrepancies in the annotation of genes in twenty fully-446

assembled genomes of Salmonella Typhimurium strains deposited in GenBank. We 447

performed cross-annotation using PanCoreGen software package to add, on average, 448

hundreds of genes to each genome that were missed by the original annotation and to 449

identify genes present in all, some or only one of Typhimurium strains. We determined 450

that chromosomal portion of the Typhimurium genome is highly conserved with limited 451

influx of novel fitness-related genes via horizontal transfer, except for the horizontal 452

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 21: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

21

movements of phage clusters. Since the serovar had emerged, only few non-phage 453

horizontal transfer events could be recorded, all of them related to MDR phenotype. 454

However, the acquisition of antibiotic resistance brings genes that might change fitness 455

and, potentially, virulence of the strains. Besides, the fitness evolution in Typhimurium 456

could also be driven in certain strains by complete or partial loss of some of original 457

core genes. 458

High diversity of gene content in Salmonella enterica ss. enterica genomes 459

reflects considerable extent of horizontal gene transfer events. Previous works have 460

demonstrated the importance of gene acquisition in virulence, antibiotic resistance, 461

novel metabolic pathways and other adaptive traits. Gene acquisition in form of islands 462

or as part of phages has been suggested to play key role in emergence of different 463

Salmonella serovars. For example, genes harboring an array of genomic islands and 464

prophages (SPI-6, Gifsy-1, Gifsy-2, etc.) are important for intracellular replication of 465

Typhimurium serovar strains (26). SPI-12, encoding a remnant phage, in Typhimurium 466

includes at least four genes for transcriptional activation/regulation, fitness, thereby 467

facilitating bacterial survival in the host (27). Both SPI-1 and SPI-2 encode type III 468

secretion system (T3SS) involved in translocation of virulence proteins into host cells, 469

while multiple T3SS effectors are part of SPI-5. Besides, SPI-3 and SPI-4 promote 470

intestinal colonization via MisL (a member of autotransporter protein family) and SiiE (a 471

non-fimbrial adhesin as part of type I secretion system), respectively. All these insertion 472

events are known to be ancestral to Typhimurium serovar. However, whether or not 473

gene transfer plays a significant role in the genome and fitness diversification after the 474

serovars had emerged has not been studied in detail. 475

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 22: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

22

Here, we investigated the intra-serovar evolution of Salmonella Typhimurium that 476

arguably exhibits the broadest host and pathogenicity range among all serovars of the 477

S. enterica ss. enterica. This serovar is able to infect both livestock and human hosts 478

(e.g., pathovar DT104) as well as exhibiting a restricted host range (e.g., pathovars DT2 479

and DT99) (28). Importance and diversity of the Typhimurium serovar is reflected in the 480

number of strains with sequenced genomes of a high-quality assembly that were 481

deposited in the GenBank. This includes one of the first Salmonella strains sequenced 482

15 years ago – of the model strain LT2 (12) – as well as strains from invasive human 483

infections (D23580, 33676, YU39) (15, 29, 30), or harboring MDR phenotype and 484

isolated from various animals (CFSAN001921, L-3553, etc.) (25, 31). 485

One could expect, that the large time period over which the Typhimurium 486

genomes had been obtained and the diversity of research groups who contributed them 487

to the public domain would certainly result in differences of the genome annotations. 488

First of all, algorithms for the recognition of ORFs have been evolving constantly and 489

different research groups could use different stringency criteria (like a minimal sequence 490

length) of defining an ORF as a functional gene. Second, in some genomes, but not 491

others, certain genes could be absent due to either gene deletion or, alternatively, 492

horizontal acquisition. Third, nonsense or frame-shift mutations, small deletions and 493

insertional disruptions could make certain genes missed by the annotation. Finally, use 494

of different reference genomes and/or annotation databases could lead to giving same 495

gene different name or hypothetical function status. One way or another, all this could 496

lead to significant differences in annotation of different genomes that could be just mere 497

artifacts, with many genes missed or given various names. By far, this problem is not 498

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 23: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

23

limited to the Salmonella Typhimurium and likely to be a general problem in bacterial 499

genomics. 500

A couple of recent pan-genomic profiling of Salmonella enterica were based on 501

construction of orthologous groups (32), or on de novo annotation of genes (14). Over 502

the last decade, a number of powerful analysis tools have been developed for pan-503

genomic profiling that can successfully identify homologous genes and their core, 504

mosaic or strain-specific nature in a given dataset, along with plotting and visualization 505

of the profiles, both sequence and function-based annotation and curation of genomes, 506

reconstructing phylogenetic relationships of orthologous genes/families, etc. (33, 34). 507

However, an important aspect yet to be integrated in the existing pan-genomic analysis 508

approaches appears to be the information of gene gain and gene loss during specific 509

lineage evolution, i.e. to differentiate the accessory genes in the pan-genome derived 510

via horizontal gene acquisition from the ‘ancestrally core’ genes that have been lost 511

from specific strains/lineages over the course of evolution and thus are part of the 512

accessory fraction (which we designate here as unstable core genes). Such information 513

in the assortment of genes would provide insights on possible functional adaptation via 514

pan-genomic evolution. Here we consider each annotated Typhimurium genome as 515

reference relying on existing annotations to perform cross-annotation among annotated 516

genomes to identify unannotated genes, and finally reconstruct the pan-genome based 517

on uniformly re-annotated individual genomes. We identify a total of 4711 genes found 518

in all strains (stable core) and 255 genes found in most isolates (unstable core) in the 519

analyzed Typhimurium strains. The determination of the total core gene-set that was 520

originally present when the serovar had emerged offers the research field with 521

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 24: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

24

evolutionary ‘Eve’ of Typhimurium strains that could be used for annotation of newly 522

sequenced genomes and better understanding of their evolutionary dynamics. 523

Upon cross-annotation of originally analyzed eight, and then twelve additional 524

genomes, it has become clear that the accessory fraction of the Typhimurium genomes 525

is predominantly contributed by continuous inflow and outflow of phage elements. In 526

absence of the horizontal transfer of phage clusters, Typhimurium serovar would have 527

considered to have almost a closed pan-genome. Though we did not analyze here the 528

plasmid diversity of the Typhimurium strains, whether or not plasmids play a significant 529

role in the adaptive evolution of Salmonella Typhimurium, it is astounding that the 530

serovar’s genome appears to be so restricted to the acquisition of non-phage genomic 531

islands by horizontal transfer. Also, there is always a possibility that a genomic island, 532

instead of being a result of true horizontal transfer event, could be a shift from an 533

unanalyzed plasmid, thereby leading to an even more closed structure of Typhimurium 534

pan-genome. Keeping in mind the broadest host and pathogenicity range of 535

Typhimurium among all S. enterica serovars, one can suggest a possibility of similar 536

scenarios for other S. enterica serovars as well. This is to a great contrast, for example, 537

to continuous movement of genomic islands in E. coli strains that belong to the same 538

serotypes and/or multi-locus sequence types (STs). Those closely-related clonal groups 539

might be considered equivalent to specific serovars of Salmonella from perspectives of 540

population and evolutionary genetics. For example, a single avian-pathogenic E. coli 541

strain APECO1 from ST95 has been shown to harbor 43 genomic islands that differ 542

greatly in content from strains representing same serotype and/or ST (35). Also, E. coli 543

strains of multi-drug resistant clonal group ST131 considerably vary in acquisition of 544

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 25: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

25

several genomic islands that carry various pathogenicity-relevant (and no antibiotic 545

resistance) genes (36). Another example is ST73 that combine strains with highly-546

diverse genomic content and include model uropathogenic strain CFT073, 547

asymptomatic bacteriuria strain 83972 and probiotic strain Nissle 1917 (37, 38). This 548

might suggest strong physiological barriers in Salmonella compared to E. coli. 549

Alternatively, limited genomic diversity in Typhimurium could also be the result of 550

relatively recent emergence of this serovar that did not allow enough time for the 551

frequency of horizontal transfer events to be comparable with some of the E. coli 552

serotypes or STs having much older origin. Therefore, additional studies on accurate 553

molecular clock estimation of the clonal groups’ age are warranted to understand the 554

basis of difference in the gene transfer rates in different species. 555

Majority of the prophage gene clusters seemed to be comprised of genes related 556

to the phage movement, biogenesis and structural components. However, this does not 557

mean that phage acquisition or loss could not have adaptive effect on the Typhimurium 558

strains, including their virulence. On one hand, disruption of the genomic region of the 559

prophage insertion could affect the function of nearby genes. Interestingly, two different 560

phage clusters – φST104 and φBTP1 – are inserted into the same spot in the MDR 561

strain T000240 and the systemically invasive strain D23580, respectively. In both 562

strains, the insertion has happened downstream of proA gene (encoding gamma-563

glutamyl phosphate reductase) and upstream of IS3 transposase that is itself positioned 564

immediately upstream of the stb fimbrial gene cluster. If the function of the surrounding 565

genes is affected one way or another by the phage insertion, this could have significant 566

effects on the bacterial physiology. On another hand, presence of prophages in the 567

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 26: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

26

bacterial chromosome could have a pleiotropic effect on the expression of chromosomal 568

genes by, for example, transcriptional cross regulation. Thus, these phage might have a 569

similar effect on corresponding host strains not only by being inserted into the same 570

genomic site but also by having similar in trans effect. It is noteworthy that we detected 571

presence of phage cluster exactly at the same location as of φST104 and φBTP1 (in 572

T000240 and D23580, respectively) exclusively in all the MDR and/or invasive ones of 573

additional 12 Typhimurium strains analyzed (i.e. U288, CFSAN001921, DT104, 138736, 574

L-3553, 33676, YU39). In remaining 11 non-MDR and/or non-invasive strains, this 575

genomic location remained uninterrupted where proA was immediately followed by IS3 576

transposase. Insertion of different phage clusters suggested a hotspot nature of this 577

genomic region allowing independent acquisition of phage genes by MDR/invasive 578

strains of this serovar. However, a detailed experimental and population studies are 579

warranted to determine any direct or indirect role of prophage genes in Typhimurium 580

physiology, virulence and/or drug resistance phenotypes. 581

Detailed analysis of the gene presence/absence in the cross-annotated 582

Typhimurium strains found that about 5% of original core genes were completely or 583

partially lost in some strains after the serovar emerged. Gene loss by deletion or 584

formation of pseudogenes is thought to be driven by two contrasting evolutionary 585

processes. On one hand, genes can be lost as part of ‘use-or-lose’ evolutionary 586

dynamics, i.e. if the function of those genes are not needed anymore for a particular life-587

style of bacterial strain. Such loss is accumulated as result of genetic drift, i.e. via 588

random events not driven by positive selection. On another hand, gene loss could be 589

driven by ‘die-or-lose’ evolutionary mechanism that removes genes, functions of which 590

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 27: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

27

reduce fitness in certain environments. In the course of clinical infection, for example, 591

some genes might interfere with the expression or function of virulence-promoting 592

factors or increase liability of pathogen by expressing traits recognizable by the host 593

defenses. 594

Complete or partial loss of core genes was noted primarily in invasive strain 595

D23580, MDR strain T000240, and in two closely related strains – UK-1 (called 596

‘universal killer’ for its high invasion and virulence properties) and 14028S (15, 17, 39, 597

40). This highlights a possible central role of gene inactivation in the evolution of fitness 598

diversity of Salmonella Typhimurium strains. Interestingly, previous work has suggested 599

an adaptive convergence of the loss del_6 cluster from systemically invasive 600

Typhimurium D23580 with the same event in systemically invasive serovar Typhi (15). 601

However, we detect here that all non-Typhimurium serovars (invasive and non-invasive) 602

show either partial or complete loss or truncation of genes in this cluster (see Fig. S3 in 603

the supplemental material). Also, earlier work demonstrated that disruption of stbC gene 604

in stb fimbrial operon could lead strain LT2 to a flagellated strain exhibiting constitutively 605

mannose-sensitive agglutinating and multi-drug resistant phenotype (41). Interestingly, 606

we detect that, although stbC gene has remained intact, other genes of the stb fimbrial 607

operon such as stbB, stbA etc. are lost in the MDR strain T000240 due to absence of 608

del_1 cluster. It would be worth finding out if the loss of these genes might have 609

somehow attributed to the resistance phenotype of this strain. Our analysis of additional 610

12 strains of serovar Typhimurium revealed loss of a few other ancestrally core gene-611

clusters in some of them (see Table S7 in the supplemental material). However, 612

understanding of the functional and adaptive significance of the gene loss and 613

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 28: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

28

inactivation in Typhimurium strains is beyond this study and will require expanded 614

analysis of Typhimurium and non-Typhimurium strains as well as experimental studies. 615

In the original set of eight genomes, the only true transfer of gene-cluster of non-phage 616

origin is the T000240-specific genomic island GI-DT12 that possesses determinants of 617

antibiotic resistance, mercury resistance, iron acquisition, heavy metal tolerance etc. 618

Previous work demonstrated that this island has most likely helped the isolate to adapt 619

to adverse environmental conditions like extremely polluted sewage (17). In the 620

additional twelve genomes, only two more genomic islands (in strains DT104, 138736 621

and L-3553) were identified that also carried both MDR and fitness genes. While we 622

identified one more potential island (in another MDR strain CFSAN001921) of obscure 623

origin and function, our observations strongly indicate that, in the course of evolution of 624

the Typhimurium serovar, antibiotic resistance might be the major selection factor 625

behind the acquisition of genomic islands that also bring some fitness associated genes 626

in the recipient genomes. 627

628

629

FUNDING INFORMATION 630

This work was supported by the National Institutes of Health grant R01 AI106007. 631

632

633

634

635

636

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 29: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

29

REFERENCES 637

1. Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and the 638

nature of bacterial innovation. Nature 405:299-304. 639

2. Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer in 640

prokaryotes: quantification and classification. Annu Rev Microbiol 55:709-742. 641

3. Dutta C, Pan A. 2002. Horizontal gene transfer and bacterial diversity. J Biosci 642

27:27-33. 643

4. Wiedenbeck J, Cohan FM. 2011. Origins of bacterial diversity through horizontal 644

genetic transfer and adaptation to new ecological niches. FEMS Microbiol Rev 645

35:957-976. 646

5. Mira A, Martin-Cuadrado AB, D'Auria G, Rodriguez-Valera F. 2010. The 647

bacterial pan-genome:a new paradigm in microbiology. Int Microbiol 13:45-57. 648

6. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, 649

Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen 650

TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, 651

Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz 652

MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou 653

L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O'Connor KJ, Smith 654

S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper 655

DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. 2005. Genome analysis 656

of multiple pathogenic isolates of Streptococcus agalactiae: implications for the 657

microbial "pan-genome". Proc Natl Acad Sci USA 102:13950-13955. 658

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 30: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

30

7. Soucy SM, Huang J, Gogarten JP. 2015. Horizontal gene transfer: building the 659

web of life. Nat Rev Genet 16:472-482. 660

8. Juhas M, van der Meer JR, Gaillard M, Harding RM, Hood DW, Crook DW. 661

2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. 662

FEMS Microbiol Rev 33:376-393. 663

9. Brussow H, Canchaya C, Hardt WD. 2004. Phages and the evolution of 664

bacterial pathogens: from genomic rearrangements to lysogenic conversion. 665

Microbiol Mol Biol Rev 68:560-602. 666

10. Rodriguez-Valera F, Martin-Cuadrado AB, Rodriguez-Brito B, Pasic L, 667

Thingstad TF, Rohwer F, Mira A. 2009. Explaining microbial population 668

genomics through phage predation. Nat Rev Microbiol 7:828-836. 669

11. Centers for Disease Control and Prevention (CDC). 2013. National 670

Salmonella Surveillance Annual Report, 2011. US Department of Health and 671

Human Services, CDC, Atlanta, GA, USA. 672

12. McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, 673

Porwollik S, Ali J, Dante M, Du F, Hou S, Layman D, Leonard S, Nguyen C, 674

Scott K, Holmes A, Grewal N, Mulvaney E, Ryan E, Sun H, Florea L, Miller 675

W, Stoneking T, Nhan M, Waterston R, Wilson RK. 2001. Complete genome 676

sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:852-856. 677

13. Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, Churcher 678

C, Quail MA, Stevens M, Jones MA, Watson M, Barron A, Layton A, Pickard 679

D, Kingsley RA, Bignell A, Clark L, Harris B, Ormond D, Abdellah Z, Brooks 680

K, Cherevach I, Chillingworth T, Woodward J, Norberczak H, Lord A, 681

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 31: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

31

Arrowsmith C, Jagels K, Moule S, Mungall K, Sanders M, Whitehead S, 682

Chabalgoity JA, Maskell D, Humphrey T, Roberts M, Barrow PA, Dougan G, 683

Parkhill J. 2008. Comparative genome analysis of Salmonella Enteritidis PT4 684

and Salmonella Gallinarum 287/91 provides insights into evolutionary and host 685

adaptation pathways. Genome Res 18:1624-1637. 686

14. Jacobsen A, Hendriksen RS, Aaresturp FM, Ussery DW, Friis C. 2011. The 687

Salmonella enterica pan-genome. Microb Ecol 62:487-504. 688

15. Kingsley RA, Msefula CL, Thomson NR, Kariuki S, Holt KE, Gordon MA, 689

Harris D, Clarke L, Whitehead S, Sangal V, Marsh K, Achtman M, Molyneux 690

ME, Cormican M, Parkhill J, MacLennan CA, Heyderman RS, Dougan G. 691

2009. Epidemic multiple drug resistant Salmonella Typhimurium causing invasive 692

disease in sub-Saharan Africa have a distinct genotype. Genome Res 19:2279-693

2287. 694

16. Zhang S, Kingsley RA, Santos RL, Andrews-Polymenis H, Raffatellu M, 695

Figueiredo J, Nunes J, Tsolis RM, Adams LG, Baumler AJ. 2003. Molecular 696

pathogenesis of Salmonella enterica serotype typhimurium-induced diarrhea. 697

Infect Immun 71:1-12. 698

17. Izumiya H, Sekizuka T, Nakaya H, Taguchi M, Oguchi A, Ichikawa N, Nishiko 699

R, Yamazaki S, Fujita N, Watanabe H, Ohnishi M, Kuroda M. 2011. Whole-700

genome analysis of Salmonella enterica serovar Typhimurium T000240 reveals 701

the acquisition of a genomic island involved in multidrug resistance via IS1 702

derivatives on the chromosome. Antimicrob Agents Chemother 55:623-630. 703

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 32: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

32

18. Paul S, Bhardwaj A, Bag SK, Sokurenko EV, Chattopadhyay S. 2015. 704

PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial 705

genomes. Genomics 106:367-372. 706

19 Tettelin H, Riley D, Cattuto C, Medini D. 2008. Comparative genomics: the 707

bacterial pan-genome. Curr Opin Microbiol 11:472-477. 708

20. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: a fast 709

phage search tool. Nucleic Acids Res 39:W347-352. 710

21. Huang da W, Sherman BT, Lempicki RA. 2009. Systematic and integrative 711

analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 712

4:44-57. 713

22. Broadway KM, Modise T, Jensen RV, Scharf BE. 2014. Complete genome 714

sequence of Salmonella enterica serovar Typhimurium VNP20009, a strain 715

engineered for tumor targeting. J Biotechnol 192:177-178. 716

23. Boyd D, Peters GA, Cloeckaert A, Boumedine KS, Chaslus-Dancla E, 717

Imberechts H, Mulvey MR. 2001. Complete nucleotide sequence of a 43-718

kilobase genomic island associated with the multidrug resistance region of 719

Salmonella enterica serovar Typhimurium DT104 and its identification in phage 720

type DT120 and serovar Agona. J Bacteriol 183:5725-5732. 721

24. Lee K, Kusumoto M, Sekizuka T, Kuroda M, Uchida I, Iwata T, Okamoto S, 722

Yabe K, Inaoka T, Akiba M. 2015. Extensive amplification of GI-VII-6, a 723

multidrug resistance genomic island of Salmonella enterica serovar 724

Typhimurium, increases resistance to extended-spectrum cephalosporins. Front 725

Microbiol 6:78. 726

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 33: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

33

25. Hoffmann M, Muruvanda T, Allard MW, Korlach J, Roberts RJ, Timme R, 727

Payne J, McDermott PF, Evans P, Meng J, Brown EW, Zhao S. 2013. 728

Complete Genome Sequence of a Multidrug-Resistant Salmonella enterica 729

Serovar Typhimurium var. 5- Strain Isolated from Chicken Breast. Genome 730

Announc 1:e01068-13. 731

26. Klumpp J, Fuchs TM. 2007. Identification of novel genes in genomic islands that 732

contribute to Salmonella Typhimurium replication in macrophages. Microbiology 733

153:1207-1220. 734

27. Tomljenovic-Berube AM, Henriksbo B, Porwollik S, Cooper CA, Tuinema 735

BR, McClelland M, Coombes BK. 2013. Mapping and regulation of genes within 736

Salmonella pathogenicity island 12 that contribute to in vivo fitness of Salmonella 737

enterica Serovar Typhimurium. Infect Immun 81:2394-2404. 738

28. Rabsch W, Andrews HL, Kingsley RA, Prager R, Tschape H, Adams LG, 739

Baumler AJ. 2002. Salmonella enterica serotype Typhimurium and its host-740

adapted variants. Infect Immun 70:2249-2255. 741

29. Calva E, Silva C, Zaidi MB, Sanchez-Flores A, Estrada K, Silva GG, Soto-742

Jimenez LM, Wiesner M, Fernandez-Mora M, Edwards RA, Vinuesa P. 2015. 743

Complete genome sequencing of a multidrug-resistant and human-invasive 744

Salmonella enterica serovar Typhimurium strain of the emerging sequence type 745

213 genotype. Genome Announc 3:e00663-15. 746

30. Silva C, Calva E, Calva JJ, Wiesner M, Fernandez-Mora M, Puente JL, 747

Vinuesa P. 2015. Complete genome sequence of a human-invasive Salmonella 748

enterica serovar Typhimurium strain of the emerging sequence type 213 749

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 34: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

34

harboring a multidrug resistance IncA/C plasmid and a blaCMY-2-Carrying IncF 750

plasmid. Genome Announc 3:e01323-15. 751

31. Shahada F, Sekizuka T, Kuroda M, Kusumoto M, Ohishi D, Matsumoto A, 752

Okazaki H, Tanaka K, Uchida I, Izumiya H, Watanabe H, Tamamura Y, Iwata 753

T, Akiba M. 2011. Characterization of Salmonella enterica serovar Typhimurium 754

isolates harboring a chromosomally encoded CMY-2 beta-lactamase gene 755

located on a multidrug resistance genomic island. Antimicrob Agents Chemother 756

55:4114-4121. 757

32. Gordienko EN, Kazanov MD, Gelfand MS. 2013. Evolution of pan-genomes of 758

Escherichia coli, Shigella spp., and Salmonella enterica. J Bacteriol 195:2786-759

2792. 760

33. Vernikos, G, Medini D, Riley DR, Tettelin H. 2015. Ten years of pan-genome 761

analyses. Curr Opin Microbiol 23:148-154. 762

34. Xiao J, Zhang Z, Wu J, Yu J. 2015. A brief review of software tools for 763

pangenomics. Genomics Proteomics Bioinformatics 13:73-76. 764

765

35. Johnson TJ, Wannemuehler Y, Kariyawasam S, Johnson JR, Logue CM, 766

Nolan LK. 2012. Prevalence of avian-pathogenic Escherichia coli strain O1 767

genomic islands among extraintestinal and commensal E. coli isolates. J 768

Bacteriol 194:2846-2853. 769

36. Paul S, Linardopoulou EV, Billig M, Tchesnokova V, Price LB, Johnson JR, 770

Chattopadhyay S, Sokurenko EV. 2013. Role of homologous recombination in 771

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 35: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

35

adaptive diversification of extraintestinal Escherichia coli. J Bacteriol 195:231-772

242. 773

37. Hancock V, Vejborg RM, Klemm P. 2010. Functional genomics of probiotic 774

Escherichia coli Nissle 1917 and 83972, and UPEC strain CFT073: comparison 775

of transcriptomes, growth and biofilm formation. Mol Genet Genomics 284:437-776

454. 777

38. Vejborg RM, Friis C, Hancock V, Schembri MA, Klemm P. 2010. A virulent 778

parent with probiotic progeny: comparative genomics of Escherichia coli strains 779

CFT073, Nissle 1917 and ABU 83972. Mol Genet Genomics 283:469-484. 780

39. Jarvik T, Smillie C, Groisman EA, Ochman H. 2009. Short-term signatures of 781

evolutionary change in the Salmonella enterica serovar typhimurium 14028 782

genome. J Bacteriol 192:560-567. 783

40. Luo Y, Kong Q, Yang J, Mitra A, Golden G, Wanda SY, Roland KL, Jensen 784

RV, Ernst PB, Curtiss R, 3rd. 2012. Comparative genome analysis of the high 785

pathogenicity Salmonella Typhimurium strain UK-1. PLoS One 7:e40645. 786

41. Wu KH, Wang KC, Lee LW, Huang YN, Yeh KS. 2012. A constitutively 787

mannose-sensitive agglutinating Salmonella enterica subsp. enterica serovar 788

Typhimurium strain, carrying a transposon in the fimbrial usher gene stbC, 789

exhibits multidrug resistance and flagellated phenotypes. Scientific World Journal 790

2012:280264. 791

792

793

794

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 36: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

36

FIGURE LEGENDS 795

Figure 1. Comparison of the number of genes present in 8 Typhimurium strains. The 796

left panel represents the number of protein-coding genes present in existing annotation, 797

and the right panel showing the number of genes after re-annotation. 798

799

Figure 2. Schematic representation of the pan-genomic profile for 8 Typhimurium 800

strains. The numbers of core-, mosaic- and strain-specific genes are shown based on 801

(A) existing annotation and (B) re-annotation, using 95% nucleotide sequence identity 802

and gene-length coverage thresholds for orthologous gene identification. 803

804

Figure 3. Pan-genome size distribution with increasing number of genomes in a set of 8 805

Typhimurium strains. The pan-genome size is depicted by the number of protein-coding 806

genes based on (A) existing annotation, (B) re-annotation, and (C) re-annotation 807

excluding the prophage regions. The power law fit (n = k Nγ) was performed using 808

median values (black dots). 809

810

Figure. 4. Schematic representation of the pan-genomic profile for different genomic 811

fractions of Typhimurium strains after re-annotation. The numbers of core-, mosaic- and 812

strain-specific genes are shown for (A) only the prophage regions, and (B) genomes 813

excluding the prophage regions, using 95% nucleotide sequence identity and gene-814

length coverage thresholds for orthologous gene identification. 815

816

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 37: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

37

Figure 5. Map of accessory non-phage and ‘intact’ prophage clusters in 8 Typhimurium 817

genomes. (A) The ‘intact’ prophage regions (as identified by PHAST), and (B) the 818

deleted/inserted regions are marked in the multiple alignment. While red bars show the 819

intact prophage regions present in the genomes, purple and green bars designate the 820

regions deleted from and inserted to the genomes, respectively. 821

822

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 38: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 39: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 40: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 41: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 42: Downloaded from //jb.asm.org/content/jb/early/2016/09/08/JB.00545-16.full.pdf · ð òô,1752'8&7,21 òõ +rul]rqwdojhqhwudqvihulqedfwh uldlvdpdmruirufhlqwkhdg dswdwlrqwrqryho óì

on April 3, 2020 by guest

http://jb.asm.org/

Dow

nloaded from