25
Comparative genome analysis of Lactobacillus mudanjiangensis, an understudied member of the Lactobacillus plantarum group Sander Wuyts 1,2 , Camille Nina Allonsius 1 , Stijn Wittouck 1 , Sofie Thys 3 , Bart Lievens 4 , Stefan Weckx 2 , Luc De Vuyst 2* , Lebeer Sarah 1* 1 University of Antwerp, Research Group Environmental Ecology and Applied Microbiology (ENdEMIC), Department of Bioscience Engineering, Antwerp, Belgium 2 Vrije Universiteit Brussel, Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Faculty of Sciences and Bioengineering Sciences, Brussels, Belgium 3 University of Antwerp, Laboratory of Cell Biology and Histology, Antwerp Centre for Advanced Microscopy (ACAM), Antwerp, Belgium 4 KU Leuven, Laboratory for Process Microbial Ecology and Bioinspirational Management (PME&BIM), Department of Microbial and Molecular Systems (M2S), Campus De Nayer, Sint-Katelijne-Waver, Belgium * [email protected] Abstract The genus Lactobacillus is known to be extremely diverse and consists of different phylogenetic groups that show a diversity roughly equal to the expected diversity of a typical bacterial genus. One of the most prominent phylogenetic groups within this genus is the Lactobacillus plantarum group which contains the understudied Lactobacillus mudanjiangensis species. Before this study, only one L. mudanjiangensis strain, DSM 28402 T , was described but without whole-genome analysis. In this study, three strains classified as L. mudanjiangensis, were isolated from three different carrot juice fermentations and their whole-genome sequence was determined, together with the genome sequence of the type strain. The genomes of all four strains were compared with publicly available L. plantarum group genome sequences. This analysis showed that L. mudanjiangensis harbored the second largest genome size and gene count of the whole L. plantarum group. In addition, all members of this species showed the presence of a gene coding for a putative cellulose-degrading enzyme. Finally, three of the four L. mudanjiangensis strains studied showed the presence of pili on scanning electron microscopy (SEM) images, which were linked to conjugative gene regions, coded on plasmids in at least two of the strains studied. Author summary Lactobacillus mudanjiangensis is an understudied species within the Lactobacillus plantarum group. Since its first description, no other studies have reported its isolation. Here, we present the first four genome sequences of this species, which include the genome sequence of the type strain and three new L. mudanjiangensis strains isolated from fermented carrot juice. The genomes of all four strains were compared with publicly available L. plantarum group genome sequences. We found that this species February 5, 2019 1/20 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/549451 doi: bioRxiv preprint

Comparative genome analysis of Lactobacillus ...Comparative genome analysis of Lactobacillus mudanjiangensis, an understudied member of the Lactobacillus plantarum group Sander Wuyts1,2,

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Comparative genome analysis of Lactobacillusmudanjiangensis, an understudied member of theLactobacillus plantarum group

Sander Wuyts1,2, Camille Nina Allonsius1, Stijn Wittouck1, Sofie Thys3, Bart Lievens4,Stefan Weckx2, Luc De Vuyst2*, Lebeer Sarah1*

1 University of Antwerp, Research Group Environmental Ecology and AppliedMicrobiology (ENdEMIC), Department of Bioscience Engineering, Antwerp, Belgium2 Vrije Universiteit Brussel, Research Group of Industrial Microbiology and FoodBiotechnology (IMDO), Faculty of Sciences and Bioengineering Sciences, Brussels,Belgium3 University of Antwerp, Laboratory of Cell Biology and Histology, Antwerp Centre forAdvanced Microscopy (ACAM), Antwerp, Belgium4 KU Leuven, Laboratory for Process Microbial Ecology and BioinspirationalManagement (PME&BIM), Department of Microbial and Molecular Systems (M2S),Campus De Nayer, Sint-Katelijne-Waver, Belgium

* [email protected]

Abstract

The genus Lactobacillus is known to be extremely diverse and consists of differentphylogenetic groups that show a diversity roughly equal to the expected diversity of atypical bacterial genus. One of the most prominent phylogenetic groups within thisgenus is the Lactobacillus plantarum group which contains the understudiedLactobacillus mudanjiangensis species. Before this study, only one L. mudanjiangensisstrain, DSM 28402T, was described but without whole-genome analysis. In this study,three strains classified as L. mudanjiangensis, were isolated from three different carrotjuice fermentations and their whole-genome sequence was determined, together with thegenome sequence of the type strain. The genomes of all four strains were compared withpublicly available L. plantarum group genome sequences. This analysis showed that L.mudanjiangensis harbored the second largest genome size and gene count of the wholeL. plantarum group. In addition, all members of this species showed the presence of agene coding for a putative cellulose-degrading enzyme. Finally, three of the four L.mudanjiangensis strains studied showed the presence of pili on scanning electronmicroscopy (SEM) images, which were linked to conjugative gene regions, coded onplasmids in at least two of the strains studied.

Author summary

Lactobacillus mudanjiangensis is an understudied species within the Lactobacillusplantarum group. Since its first description, no other studies have reported its isolation.Here, we present the first four genome sequences of this species, which include thegenome sequence of the type strain and three new L. mudanjiangensis strains isolatedfrom fermented carrot juice. The genomes of all four strains were compared withpublicly available L. plantarum group genome sequences. We found that this species

February 5, 2019 1/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

harbored the second largest genome size and gene count of the whole L. plantarumgroup. Furthermore, we present the first scanning electron microscopy (SEM) images ofL. mudanjiangensis, which showed the formation of pili in three strains that we linkedto genes related to conjugation. Finally, we found the presence of a unique putativecellulose-degrading enzyme, opening the door for different industrial applications ofthese Lactobacillus strains.

Introduction 1

The genus Lactobacillus is known to be extremely diverse [1]. Furthermore, it has been 2

shown that different phylogenetic groups within this genus display a diversity roughly 3

equal to the expected diversity of a typical bacterial genus [2–6]. Each of these 4

phylogenetic groups can be recognized as an entity with unique properties and a distinct 5

natural history, ecology, function and physiology [5]. Therefore, the study of these 6

phylogenetic groups separately, as if they would be one genus, can be an interesting 7

approach that might reveal new, previously overlooked, phylogenetic relationships and 8

functional properties. 9

One of the more abundantly studied species within the genus Lactobacillus, is 10

Lactobacillus plantarum. Previous genome-based phylogenetic studies have defined L. 11

plantarum as a member of the L. plantarum group together with Lactobacillus 12

fabifermentans, Lactobacillus paraplantarum, Lactobacillus pentosus and Lactobacillus 13

xiangfangensis [1, 7]. In addition, the species Lactobacillus herbarum [8], Lactobacillus 14

plajomi [9], Lactobacillus modestisalitolerans [9] and Lactobacillus mudanjiangensis [10] 15

are closely related to L. plantarum and thus should be regarded as members of the L. 16

plantarum group. Lactobacillus mudanjiangensis is a species that has been described for 17

the first time in 2013 and that was isolated from a traditional pickle fermentation in the 18

Heilongjiang province in China [10]. Since its first description, no other study has 19

provided additional characterization or reported the isolation of other strains of the L. 20

mudanjiangensis species. Therefore, currently, not a single genomic assembly of this 21

species is publicly available. However, in in this study, four strains isolated from three 22

different spontaneous carrot juice fermentations [11], were putatively classified as 23

members of this Lactobacillus species. 24

Since the discovery of the mucus-binding pili, the fimbriae or adhesins, in 25

Lactobacillus rhamnosus GG [12,13], several comparative genomic studies have focused 26

on exploring similar gene clusters in other lactobacilli, including the members of the L. 27

plantarum group [1, 14–18]. Whereas these specific pili play an important role in cell 28

surface adhesion, pili can be of importance for an array of other functions as well, 29

ranging from biofilm formation to uptake of extracellular DNA via natural competence 30

(type IV pili) or facilitation of DNA transfer via conjugation [19–21]. The latter is a 31

process that uses conjugative pili to bring bacterial cells together and provide an 32

interface to exchange macromolecules, such as DNA or DNA-protein complexes [20]. In 33

general, such a conjugation system consists of three major components, namely (i) a 34

relaxase (MOB) that will bind and knick the DNA at the origin of replication, (ii) a 35

coupling protein (T4CP) that will couple the relaxase-DNA complex to (iii) a type IV 36

secretion system (T4SS), which ultimately transfers the whole complex to the recipient 37

cell [22–25]. Historically, these conjugation systems and their pili have been associated 38

with conjugative plasmids only [23], one of the main drivers of horizontal gene 39

transfer [22,26]. However, recently, also integrative and conjugative elements (ICEs), 40

which harbor conjugation systems as well, have been found to be another important 41

driver of horizontal gene transfer [26–28]. 42

This study aimed to provide more insights into the genomic features of the 43

understudied Lactobacillus mudanjiangensis species, in relation to the other members of 44

February 5, 2019 2/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

the L. plantarum group, using a comparative genomics approach. Therefore, the 45

genome of the type strain of L. mudanjiangensis was sequenced together with three 46

strains isolated from fermented carrot juice. These and other publicly available genome 47

sequences were used to screen for L. mudanjiangensis species-specific properties, which 48

included an analysis for the presence of genes related to pili formation and conjugation. 49

In total, 304 genomes were subjected to an in-depth analysis focusing on the 50

phylogenetic relationships as well as the predicted functional capacity of these strains. 51

Results 52

The genome assembly of the type strain L. mudanjiangensis DSM 28402T was analyzed 53

together with the genome sequences of three putative L. mudanjiangensis strains 54

isolated from carrot juice fermentations, namely AMBF197, AMBF209 and AMBF249, 55

to confirm their putative classification as L. mudanjiangensis members. Furthermore, 56

to allow comparison with other closely related Lactobacillus species and detection of L. 57

mudanjiangensis species-specific properties, all publicly available genome sequences 58

(NCBI Assembly database, 24/07/2018) of L. plantarum group members were included 59

in this comparative genomics study, totaling the number of genomes analyzed to 304 60

(Table 1). 61

Table 1. An overview of the studied species and strains.Public data

Species Number of genomes Type strain ReferenceL. fabifermentans 2 DSM 21115T [29]L. herbarum 1 DSM 100358T [8]L. mudanjiangensis 0 DSM 28402T [10]L. paraplantarum 5 DSM 10667T [30]L. pentosus 13 DSM 20314T [31]L. plantarum 278 DSM 20174T [32]L. xiangfangensis 1 DSM 27103T [33]

Type strains sequenced in this studySpecies Number of genomes Type strain ReferenceL. mudanjiangensis 1 DSM 28402T [10]

In-house isolatesSpecies Number of genomes Isolation source ReferenceL. mudanjiangensis 3 Spontaneously fermented carrot juice [11]

Phylogeny of the Lactobacillus plantarum group 62

To obtain a detailed view on the phylogeny of L. mudanjiangensis in relationship to the 63

whole L. plantarum group, a maximum likelihood phylogenetic tree was constructed, 64

based on 612 single-copy core orthogroups, found with Orthofinder (Fig1 and S1 Fig). 65

The resulting topology of this tree showed seven major clades, mostly following the 66

species annotation as described in the NCBI Assembly database. However, these results 67

exposed a few wrongly annotated genomic assemblies. For example, both L. plantarum 68

MPL16 and L. plantarum AY01 were annotated as L. plantarum before, whereas here, 69

they were found within a clade that contained the L. paraplantarum type strain. 70

Similarly, L. plantarum EGD-AQ4 was found within the clade of the L. pentosus type 71

strain, whereas it was annotated as L. plantarum before. 72

The type strain of L. mudanjiangensis formed a separate clade together with the 73

strains AMBF197, AMBF209 and AMBF249 (Fig1). Based on its single-copy core 74

orthogroups, this species was phylogenetically the most distant to L. plantarum, 75

whereas its closest relative was L. fabifermentans, followed by L. xiangfangensis. 76

February 5, 2019 3/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

Fig 1. Maximum likelihood phylogenetic tree of the Lactobacillusplantarum group. The tree is based on the amino acid sequences of 612 single-copymarker genes. Lactobacillus algidus DSM 15638 was used as an outgroup. The tree waspruned to only keep 70 L. plantarum strains to avoid over-plotting. A complete tree canbe found in the Appendix (S1 Fig). The branch length of the outgroup was shortenedfor better visualization. Each tip is colored, based on its species name as annotated inthe NCBI Assembly database (where applicable), with dark blue for L. plantarum, lightgreen for Lactobacillus paraplantarum, light blue for Lactobacillus pentosus, pink forLactobacillus herbarum, light orange for Lactobacillus xiangfangensis, dark orange forLactobacillus mudanjiangensis and red for the isolates obtained from the carrot juicefermentations. Type strains of each species are annotated with a triangle (NCBI) or asquare (sequenced in this study).

Low intraclade ANI values for Lactobacillus pentosus and 77

Lactobacillus plantarum 78

To confirm that each major phylogenetic clade represented at least one different species, 79

the pairwise average nucleotide identity (ANI) values of all genome assemblies were 80

calculated (Fig2). Intraclade ANI values all exceeded the commonly used 95% species 81

level threshold [34] for L. mudanjiangensis (99.0-99.4%), L. fabifermentans (99.7-99.9%) 82

and L. paraplantarum (99.7-99.9%), whereas their interclade ANI values were far below 83

this threshold, showing that these clades all represented a single species. However, this 84

was not the case for L. pentosus, for which multiple pairwise comparisons led to 85

intraclade ANI values below this threshold, suggesting that this phylogenetic clade 86

contained at least two species (Fig2 and S2 Fig). This result was also found for some L. 87

plantarum assemblies, although to a much lesser extent, compared to L. pentosus. 88

Therefore, it was decided that, for subsequent analyses, L. plantarum was kept as one 89

species, whereas L. pentosus was split into two groups, each representing one species. 90

One species was designated as species L. pentosus, represented by a clade containing 91

twelve genomic assemblies, including the type strain L. pentosus DSM 20314T. The 92

other species, here referred to as clade 5a, was represented by two genomic assemblies 93

(L. pentosus KCA1 and L. plantarum EGD-AQ4) (Figures 1 and S2 Fig). Finally, for L. 94

xiangfangensis and L. herbarum, no intraclade comparisons could be performed, as only 95

one genome assembly was available for these species. 96

Fig 2. Density plot of all pairwise average nucleotide identity (ANI)comparisons for each Lactobacillus plantarum group species. In green allinterclade comparisons are shown, whereas orange shows all intraclade comparisons. ForLactobacillus xiangfangensis and Lactobacillus herbarum, no intraclade comparisonscould be performed, as only one genome assembly was available for these species.

Genomic features of Lactobacillus mudanjiangensis 97

The above results confirmed the initial that strains AMBF197, AMBF209 and 98

AMBF249 were members of the L. mudanjiangensis species. Therefore, here, the first 99

four genomes of this species were presented. Their genome size varied between 3.4 Mb 100

(strain DSM 28402T) and 3.6 Mb (strain AMBF209), whereas their GC content varied 101

between 42.73% (strain AMBF209) and 43.06% (strain DSM 28402T) (Table 2). Finally, 102

a high number of transfer RNA (tRNA) genes were found in all four strains. 103

A substantial difference in total genome length between the different species of the L. 104

plantarum group was found (Fig3A). Lactobacillus mudanjiangensis showed a median 105

February 5, 2019 4/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

Table 2. Genome characteristics of Lactobacillus mudanjiangensis strains.Genome Total length (bp) GC content (%) Coding sequences # tRNA genesAMBF197 3,501,388 42.85 3,463 65AMBF209 3,589,692 42.73 3,586 63AMBF249 3,554,025 42.83 3,503 71DSM 28402T 3,389,962 43.06 3,346 66

estimated genome size of 3.53 Mb, the second largest of the whole L. plantarum group 106

up to now. Lactobacillus pentosus showed the largest median estimated genome size 107

(3.77 Mb), followed by L. mudanjiangensis (3.53 Mb) and L. fabifermentans (3.43 Mb), 108

whereas for L. xiangfangensis (3.0 Mb) and L. herbarum (2.9 Mb) this size was much 109

smaller. A remarkable high spread in genome length was found within strains belonging 110

to the L. plantarum species, as their genome size ranged between 2.9 Mb and 3.8 Mb. 111

Furthermore, L. mudanjiangensis showed a GC content of 42.9%, the lowest value 112

within the whole L. plantarum group (Fig3B). Finally, regarding median gene count, 113

similar trends were found as for the genome length, with L. pentosus showing the 114

highest count, followed by L. mudanjiangensis and L. fabifermentans, whereas L. 115

xiangfangensis and L. herbarum harbored the lowest numbers of genes (Fig3C). 116

Fig 3. Estimated genome sizes, GC contents and gene counts of all genomesof Lactobacillus plantarum group species studied and predicted functionalcapacity of all unique Lactobacillus mudanjiangensis orthogroups. (A) Totalgenome size, (B) GC content and (C) gene counts of all genomes studied, colored byspecies. (D) Upset plot comparing shared orthogroup counts between the eight L.plantarum group species. Species-specific orthogroups for L. mudanjiangensis arecolored in blue and the inset shows their functional category based on EggNOGclassification. Uniquely shared orthogroups between L. mudanjiangensis and one otherL. plantarum group member are colored in orange, whereas uniquely shared orthogroupsbetween L. mudanjiangensis and two other species are colored in red.

In total, 947,588 genes were found in the whole L. plantarum group, with an average 117

of 3,110 genes per genome. These genes were further clustered into 8,005 different 118

orthogroups, leading to an average count of 2,924 orthogroups per genome. The 119

differences between these numbers was due to the fact that some genes were found in 120

multiple copies within one genome, which clustered together in a single orthogroup. Of 121

all these orthogroups, 2,172 were defined as core orthogroups and 5,833 as accessory 122

orthogroups (see Sections 1.2.2 and 4.2.3 for the definition of these terms). A detailed 123

overview of the number of genes and core and accessory orthogroups can be found in S2 124

Table. Subsequently, the distribution of orthogroups between the different L. plantarum 125

group members was explored (Fig3D). The species with the highest number of 126

species-specific orthogroups was L. plantarum. With 2,065 species-specific orthogroups, 127

it greatly outnumbered all other species, although this number was most probably 128

biased, due to the higher number of sequenced genomes available for L. plantarum 129

compared with the other L. plantarum group species. It was followed by L. 130

mudanjiangensis (Fig3D, blue) that contained 372 species-specific orthogroups and L. 131

fabifermentans harboring 213 species-specific orthogroups. Furthermore, L. plantarum 132

and L. pentosus shared the highest number of uniquely shared orthogroups (286), 133

followed by the combination of L. plantarum and L. paraplantarum (219 uniquely 134

shared orthogroups), which seemed to be in line with the phylogeny described in Fig1. 135

In contrast, L. mudanjiangensis shared more unique orthogroups with the 136

phylogenetically distant L. plantarum (166 orthogroups; Fig3D, yellow) than it did with 137

the most closely related species, L. fabifermentans (59 orthogroups). 138

To get more insights into the unique properties of L. mudanjiangensis, all 372 139

February 5, 2019 5/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

species-specific orthogroups were further classified using the EggNOG database (inset 140

Fig3D). However, this resulted in a vast majority of orthogroups (304), classified under 141

‘Category S: Function unkown’, showing that further experimental work on functional 142

gene validation is necessary. Besides this, most orthogroups belonged to category G 143

(carbohydrate transport and metabolism, 14 orthogroups), followed by category M (cell 144

wall/membrane/envelope biogenesis, 13 orthogroups). 145

Lactobacillus mudanjiangensis harbors a potential 146

cellulose-degrading enzyme 147

Carbohydrate transport and metabolism (category G) was found to be the most 148

abundantly characterized category among the L. mudanjiangensis species-specific 149

orthogroups. Further examination of the 14 unique orthogroups that were detected in 150

this category, revealed the presence of a gene in all four strains annotated as 151

endoglucanase E1, which is involved in the conversion of cellulose polymers into simple 152

saccharides [35]. A BLAST search of the DNA sequences of this gene to the NCBI nt 153

database showed a best scoring hit (26% coverage and 69% identity) with a member of 154

the Herbinix species (GenBank accession number LN879430). The Herbinix genus 155

contains cellulose-degrading bacteria [36]. This result also showed that this gene was 156

not found in any other member of the Lactobacillus Genus Complex (LGC), or any 157

other LAB, and confirmed its uniqueness to L. mudanjiangensis. Since endoglucanases 158

are classified as glycosyl hydrolases (GHs), GHs were predicted for all genomes. Indeed, 159

for all four L. mudanjiangensis strains, this endoglucanase E1 gene was classified as 160

belonging to the GH5 1 family, which was a family uniquely found in L. 161

mudanjiangensis. Although this GH family showed some degree of polyspecificity, the 162

majority of enzymes (22 of 24 enzymes characterized) are reported as 163

endoglucanases [37]. Together, these results thus pointed towards the presence of a 164

novel putative cellulose-degrading enzyme in all four L. mudanjiangensis strains. 165

Presence of a putative conjugative system in Lactobacillus 166

mudanjiangensis 167

The second most abundant category of L. mudanjiangensis-specific orthogroups, 168

excluding category S (function unknown), were genes related to ‘cell wall, membrane, or 169

envelope biogenesis‘ (category M). Examination of the annotation of the genes 170

belonging to these orthogroups did not reveal any new insights, as many of them were 171

annotated as hypothetical proteins. Therefore, scanning electron microscopy (SEM) was 172

performed to screen the cell surfaces of these four strains in more detail. This analysis 173

revealed that three of the four strains (L. mudanjiangensis DSM28402T, AMBF209 and 174

AMBF249) formed pili or fimbriae, connecting different cells to each other as well as 175

cells to an undefined structure (Fig4A). 176

Fig 4. Scanning electron microscopy (SEM) and genes related toconjugation. (A) SEM images of all four Lactobacillus mudanjiangensis strainsstudied. White arrows indicate putative conjugative pili. (B) Gene clusters encoding aputative conjugation system, colored by their potential function, as classified byCONJSCAN. The text above each gene shows its matching orthogroup. (C) Schematicmodel representing the process of bacterial conjugation with all three mandatoryelements. Scheme adapted from [23]. (R, relaxase; T4CP, Type IV coupling protein;T4SS, Type IV secretion system).

To identify the genes encoding these pili, all genome sequences of L. 177

mudanjiangensis were screened for the presence of genes associated with these kinds of 178

February 5, 2019 6/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

phenotypes. These included the spaCBA gene cluster, which has been linked with 179

probiotic properties in L. rhamnosus, due to better adhesion to intestinal epithelial 180

cells [38, 39], as well as secretion systems based on pili, such as the type II and type IV 181

secretion systems [25,40]. In this study, no spaCBA gene cluster was found. However, 182

further exploration revealed the presence of a conjugation system in at least three of the 183

four L. mudanjiangensis strains examined (AMBF209, AMBF249 and DSM 28402T). 184

Two complete conjugation systems containing all three mandatory parts (Fig4C) 185

were found in L. mudanjiangensis AMBF209 and AMBF249, whereas one complete 186

conjugation system was found in L. mudanjiangensis DSM 28402T (Fig4B). For all four 187

L. mudanjiangensis strains, the relaxase gene of this conjugation system was classified 188

as a member of the MOBQ class, whereas the coupling protein was classified as a 189

T4CP2. The MPF system, which harbored the putative pilus, was further classified as 190

belonging to the class MPFFATA, which groups the MPF systems of Gram-positive 191

bacteria [24]. VirB4 was identified as the ATPase motor of this MPF system. 192

Furthermore, this MPF system contained three accessory genes (trsC, trsD and trsJ ) in 193

L. mudanjiangensis AMBF209 and AMBF249, whereas four accessory genes were 194

annotated in L. mudanjiangensis DSM 28402T (trsC, trsD, trsF and trsJ ) (Fig4B). 195

Homologs for the genes trsC and trsD were already previously identified, with trsC 196

coding for a VirB3 homolog, which is linked to the formation of the membrane pore, 197

and trsD coding for another homolog of VirB4, the conjugation ATPase [24]. In 198

contrast, both trsF and trsJ are poorly characterized. 199

Further analysis of the genes surrounding the annotated conjugation genes showed 200

that this genomic region contained 18 to 19 open reading frames, most of them 201

annotated as hypothetical proteins (Fig4B and S3 Table). However, a bacteriophage 202

peptidoglycan hydrolase domain was found in orthogroup OG0002812 in both L. 203

mudanjiangensis AMBF209 conjugation region 1 (AMBF209 CR1) and L. 204

mudanjiangensis AMBF249 conjugation region 2 (AMBF249 CR2), making it a 205

VirB1-like protein [41]. In Agrobacterium tumefaciens, the VirB1 protein provides 206

localized lysis of the peptidoglycan cell wall to allow insertion of the T4SS [42]. A 207

similar domain, also known to harbor peptidoglycan lytic activity, was found in L. 208

mudanjiangensis DSM28402 CR1 (orthogroup OG0002812). Finally, another conserved 209

domain was found in all five gene regions clustered in orthogroup OG0003012, 210

annotated as T4SS CagC, which was shown to be a VirB2 homolog. VirB2 is the major 211

pilus component of the type IV secretion system of A. tumefaciens, which is the main 212

building block for extension and retraction of the conjugative pilus [43–45]. Taken 213

together, these results showed the presence of pili in three L. mudanjiangensis strains 214

(AMBF209, AMBF249 and DSM 28402T), which after genomic analysis were 215

hypothesized to be part of a conjugation system. 216

Finally, genome analysis of all other L. plantarum group members showed that, in 217

contrast to an initial belief, the presence of a complete conjugation system was not 218

unique to L. mudanjiangensis (S1 Table). All three necessary genes were also found in 219

58 of 275 L. plantarum strains, two of seven L. paraplantarum strains and four of twelve 220

L. pentosus strains. In contrast, the system was completely absent in clade5a, L. 221

herbarum, L. xiangfangensis and L. fabifermentans. 222

Plasmid reconstruction from genome data 223

Many conjugation systems are coded on plasmids [27]. Therefore, all four L. 224

mudanjiangensis genomes were screened for plasmid presence using Recycler [46]. 225

Plasmids were only found in two of four genome assemblies, namely L. mudanjiangensis 226

AMBF209 and AMBF249 (Fig5). Both strains harbored a plasmid of 27.3 Kb with 33 227

predicted genes, which after pairwise alignment of the plasmid were foun to be exactly 228

the same. Subsequently, the presence of a conjugation system on these plasmids was 229

February 5, 2019 7/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

confirmed using CONJScan. Further examination showed that the plasmid exactly 230

matched the above described AMBF209 CR2 and AMBF249 CR1 gene regions (Fig4B). 231

Regarding annotated genes, 13 of 33 gene products were predicted as hypothetical 232

proteins by Prokka. Further annotation using the EggNOG database revealed that most 233

genes were mapped to category S (function unknown), followed by category L 234

(replication, recombination and repair) and category C (energy production and 235

conversion). Finally, a BLAST search was performed against the NCBI nt database to 236

explore whether a similar plasmid was already described in the literature. This resulted 237

in a best matching hit, showing 97% identity and 76% query coverage with plasmid 238

pKLC4 of Leuconostoc carnosum JB16 (GenBank accession number CP003855). This 239

plasmid was found in a strain isolated from kimchi and has a length of 36.6 Kb [47], 240

indicating that some deletions occurred in the L. mudanjiangensis plasmids. All 241

together, these results showed that L. mudanjiangensis strains AMBF209 and 242

AMBF249 carried the same conjugative plasmid, of which the encoded gene functions 243

are poorly characterized. 244

Fig 5. Plasmid map of the predicted conjugative plasmid of Lactobacillusmudanjiangensis AMBF209 and AMBF249. Genes are colored according totheir annotation, as defined in Fig4.

Since only two of five conjugation regions (Fig4) were plasmid-encoded, an 245

additional analysis was performed to assess whether the other three conjugation systems 246

could be part of an ICE. For this, all four L. mudanjiangensis genomes were analyzed 247

similar to a recently published method [26]. However, ICE regions usually contain 248

repeats, such as transposases, leading to fragmentation of these ICE regions, if short 249

read sequencing technology is used [26]. Therefore, these analysis methods usually 250

require a complete genome for proper ICE identification. The assembly state of the four 251

genomes made it thus hard to correctly interpret the results obtained. 252

Discussion 253

In this study, the genome sequence of the L. mudanjiangensis type strain DSM 28402T 254

was presented together with the genomes of three new L. mudanjiangensis strains, 255

AMBF197, AMBF209 and AMBF249, which were isolated from three different 256

spontaneous carrot juice fermentations [11]. Since previous phylogenetic analysis of this 257

species, using the 16S rRNA, pheS, and rpoA genes, showed a close genetic relatedness 258

with the members of the L. plantarum group [10], it was decided to study these 259

genomes in relation to the closely related members of the L. plantarum group with a 260

comparative genomics approach. A maximum likelihood phylogenetic tree confirmed 261

that L. mudanjiangensis was closely related to all other L. plantarum group members. 262

Furthermore, pairwise ANI analysis confirmed that the three strains isolated from 263

carrot juice fermentations were indeed members of the L. mudanjiangensis species. Yet, 264

this analysis also revealed that several genomes annotated as L. pentosus and L. 265

plantarum showed intraclade ANI values below the commonly used cut-off value of 95% 266

identity [34]. Especially for L. pentosus, low intraclade ANI values (a maximum of 267

92,3%) were found, meaning that if the 95% cut-off is strictly applied, two genome 268

assemblies could not be seen as members of the L. pentosus species. One of them 269

represented the genome sequence of the vaginal isolate L. pentosus KCA1 [48], the other 270

a genome sequence that has been annotated as L. plantarum EGD-AQ4, isolated from a 271

fermented bamboo shoot product [49]. The fact that these two genomes showed lower 272

intraclade ANI values could mean that these bacteria are undergoing a speciation 273

process or that this strict ANI value cut-off should be revised, as it potentially separates 274

February 5, 2019 8/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

members from the same species into different species. 275

The estimated genome size of L. mudanjiangensis was the second largest of the 276

whole L. plantarum group found up to now. The same trend was found for the gene 277

count per species. This means that L. mudanjiangensis harbored one of the largest 278

genomes of the whole LGC, since L. plantarum and especially L. pentosus are known to 279

be among the lactobacilli with the largest genome size and gene counts [5, 50, 51]. For L. 280

plantarum, this large genome and also its pangenome size have been coupled to a 281

nomadic lifestyle [5, 50]. This lifestyle comes with a high genetic diversity, which is 282

associated to the possibility to survive and thrive in many different ecosystems. This 283

could possibly also be applied to L. mudanjiangensis, supported by the fact that L. 284

mudanjiangensis showed a slightly larger number of GHs than L. plantarum (Data not 285

shown), indicating that this species is capable of transforming and metabolizing a broad 286

spectrum of carbohydrate sources. This observation was confirmed by the fact that the 287

type strain was found to be capable of producing organic acids from at least 21 different 288

carbon sources [10]. Furthermore, here, a putative cellulose-degrading enzyme, 289

annotated as endoglucanase E1, was found in all four L. mudanjiangensis strains, which 290

was not found in any other LGC genome so far. Moreover, cellulose-degrading enzymes 291

were never found in LAB up to now. Cellulose is the most abundant organic polymer on 292

earth [52], the most important skeletal component in plants in general [52] and the most 293

abundant crude fiber in carrots [53]. Lactobacillus mudanjiangensis’ putative capability 294

to degrade this fiber into glucose might allow members of this species to survive in many 295

different plant-related ecosystems. Fermented carrot juices and fermented pickles are 296

examples of such ecosystems, where three and one of the strains studied were isolated 297

from, respectively. Together, these results suggested that, similar to L. plantarum, a 298

nomadic or otherwise a plant-adapted lifestyle could be assigned to L. mudanjiangesis. 299

SEM analysis revealed the presence of pili or fimbriae in three of the four L. 300

mudanjiangensis strains studied. In this study, the observation of pili in L. 301

mudanjiangensis was associated with bacterial conjugation. Three of the four strains 302

were found to carry at least one complete putative conjugation region, including a gene 303

that possibly codes for a VirB2 homolog, the major subunit of a conjugation-related 304

pilus [43–45]. The three strains that harbored this conjugation region all showed pili 305

formation on the SEM images, whereas this was not the case for strain AMBF197, 306

which lacked this region. These results suggested that the detected pili might play a role 307

in cell to cell contact during the conjugation process, although this was not yet 308

experimentally validated. 309

Conjugation is one of the main drivers of horizontal gene transfer and is commonly 310

associated with conjugative plasmids [22,26]. Here, two of the five conjugative regions 311

found were plasmid-associated and the two plasmids found were exactly the same for 312

both L. mudanjiangensis AMBF209 and AMBF249, although these strains were 313

isolated from different household carrot juice fermentations [11]. Previous studies also 314

identified and described conjugative plasmids in other Lactobacillus species, such as 315

Lactobacillus brevis [54], Lactobacillus casei [55], Lactobacillus gasseri [56], Lactobacillus 316

hokkaidonensis [57], L. plantarum [58] and Lactobacillus reuteri [59]. Genes on these 317

plasmids often code for proteins involved in detoxification, virulence, antibiotic 318

resistance and ecological interactions [22], which could give them a fitness advantage in 319

certain environments. Here, apart from the conjugation-related genes, many genes were 320

annotated as hypothetical proteins on the conjugative plasmid. However, since this 321

plasmid showed great similarity with a plasmid from a Leuconostoc strain, which was 322

isolated from fermented kimchi [47], it could potentially harbor genes that are beneficial 323

for survival on plants or in a fermented vegetable environment. 324

In conclusion, in this study, the genome sequences of four L. mudanjiangensis strains 325

were studied in relation to the closely related members of the L. plantarum group. 326

February 5, 2019 9/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

Comparative genome analysis of this phylogenetic group found two wrongly annotated 327

genome assemblies and intraclade ANI values below the commonly used species 328

delimitation threshold for L. plantarum and L. pentosus. Furthermore, L. 329

mudanjiangensis harbored one of the largest genomes and the highest gene counts of the 330

L. plantarum group. Together with its broad repertoire of GHs and its potential 331

capability to degrade cellulose, either a nomadic or plant-adapted lifestyle could be 332

assigned to L. mudanjiangensis. Finally, three of the four L. mudanjiangensis strains 333

studied showed the presence of pili on SEM images, which were linked to conjugative 334

gene regions. For two strains, L. mudanjiangensis AMBF209 and AMBF249, these 335

regions were plasmid-associated. Further experimental studies, such as phenotypic 336

growth curve-based screenings, conjugation experiments and the creation of knock-out 337

mutants, are necessary to characterize the plasmid found and to confirm the link 338

between the pili observed and this conjugation gene region. 339

Materials and methods 340

Sequencing of the Lactobacillus mudanjiangensis type strain 341

and downloading of publicly available assemblies 342

The type strain of L. mudanjiangensis [L. mudanjiangensis DSM 28402T ( = LMG 343

27194T = CCUG 62991T)] was purchased from a public microorganism collection 344

(BCCM-LMG, Ghent, Belgium). The strain was grown overnight in de 345

Man-Rogosa-Sharpe (MRS) medium (Carl Roth, Karlsruhe, Germany) and DNA was 346

extracted using the NucleoSpin 96 tissue kit (Macherey-Nagel, Duren, Germany), with 347

an extra cell lysis step using 20 mg/mL of lysozyme (Sigma-Aldrich, St. Louis, MO, 348

USA) and 100 U/mL of mutanolysin (Sigma-Aldrich). Whole-genome sequencing was 349

performed using the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, 350

CA, USA) and the Illumina MiSeq platform, using 2 x 250 cycles, at the Laboratory of 351

Medical Microbiology (University of Antwerp, Antwerp, Belgium) in the case of the 352

strains AMBF197, AMBF249 and DSM28402T or 2 x 300 cycles at the Center of 353

Medical Genetics Antwerp of the University of Antwerp for strain AMBF209. Assembly 354

of the genome sequence was performed using SPAdes v 3.12.0 [60]. In addition, all 355

genome sequences annotated as L. fabifermentans, L. herbarum, L. paraplantarum, L. 356

pentosus, L. plantarum and L. xiangfangensis were downloaded from the National 357

Center for Biotechnology Information (NCBI) Assembly database on 24/07/2018, using 358

in-house scripts. In total, 310 genomes were used as an input for quality control. 359

Quality control and annotation 360

Basic genome characteristics, including genome size, GC content and the N50 value, 361

were estimated using Quast 4.6.3 [61]. The quality of the genome assemblies was 362

evaluated using the Quast output. After visualization of several quality control 363

parameters using ggplot2 [62], genomes with a N50 value < 25,000 bp and a number of 364

undefined nucleotides (N) per 100,000 bases > 500 were discarded. An overview of all 365

genome sequences and strains that passed this quality control (304 assemblies) can be 366

found in S1 Table. Finally, Prokka 1.12 [63] was used to predict and annotate genes for 367

all genome sequences. In addition to its internal databases, a customized genus-specific 368

BLAST database was used for higher quality annotation with Prokka’s –usegenus 369

option. This database was created using BLAST [64,65] and all complete Lactobacillus 370

genomes found in the NCBI Assembly database. Genes encoding glycosyl hydrolases 371

(GHs) were detected by scanning all genomes against hidden Markov model (HMM) 372

profiles of the CAZyme families [66]. The profiles were downloaded from the dbCAN 373

February 5, 2019 10/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

webserver [67] and queried using HMMSCAN [68]. An E-value of 1 x 10-15 and a 374

coverage of 0.35 were used as cut-off, similar to what has been described before [69]. 375

Defining the pangenomes of all Lactobacillus plantarum group 376

species 377

To define the pangenome, all genes were clustered into orthogroups using OrthoFinder 378

2.2.6 [70] and further analyzed in R [71]. Here, a core orthogroup is defined as an 379

orthogroup present in more than 95% of a set of genomes. All other orthogroups are 380

defined as accessory orthogroups. An upset plot was created using the R package 381

UpSetR [72]. Unique orthogroups belonging to L. mudanjiangensis were further 382

annotated using EggNOG-mapper [73] and visualized using ggplot2 [62]. 383

Phylogenetic tree construction 384

Single-copy core orthogroups found by Orthofinder were used as input for the 385

construction of a phylogenetic tree. Lactobacillus algidus DSM 15638 (NCBI Assembly 386

accession number GCA 001434695) served as an outgroup, as it is the species most 387

closely related to the L. plantarum group [1]. The first protein sequence of each fasta 388

file of the single-copy core orthogroups was compared with a BLAST database of all 389

genome proteins of the outgroup’s genome sequence. All hits with a coverage > 75% 390

and a percentage similarity > 50% were added to the alignment of each orthogroup. 391

These alignments, on amino acid level, were concatenated into a supermatrix that was 392

used in RaxML 8.2.9 [74], to build a maximum likelihood phylogenetic tree with the –a 393

option, which combines a rapid bootstrap algorithm with an extensive search of the tree 394

space, starting from multiple different starting trees. The tree and subtrees were plotted 395

with the R package ggtree [75]. 396

Average nucleotide identity 397

All pairwise ANI values were calculated with the Python pyani package [76], using a 398

BLASTN approach [64,65] based on the methodology described by Goris et al. [77]. 399

Scanning electron microscopy 400

To assess the presence or absence of pili or fimbriae on the cell surface of L. 401

mudanjiangensis strains AMBF197, AMBF209, AMBF249 and DSM 28402T, SEM was 402

performed. To this end, the bacterial strains were grown overnight (MRS medium, 403

37°C), gently washed with phosphate-buffered saline (per liter: 56 g of NaCl, 1.4 g of 404

KCl, 10.48 g of Na2HPO4, 1.68 g of KH2PO4; pH 7.4) and spotted on a gold-coated 405

membrane [(approximately 5 x 107 colony forming units (CFU) per membrane]. 406

Bacterial spots were fixed with 2.5% (m/v) glutaraldehyde in 0.1 M sodium cacodylate 407

buffer (2.5% glutaraldehyde, 0.1 M sodium cacodylate, 0.05% CaCl2.2H2O; pH 7.4) by 408

gently shaking the membrane for 1 h at room temperature, followed by a further 409

overnight fixation at 4 °C. After fixation, the membranes were washed three times for 20 410

min with cacodylate buffer (containing 7.5% [m/v] saccharose). Subsequently, the 411

bacteria were dehydrated in an ascending series of ethanol (50%, 70%, 90% and 95%, 412

each for 30 min at room temperature and 100% for 2 x 1 h and 1 x 30 min) and dried in 413

a Leica EM CPD030 (Leica Microsystems Belgium, Diegem, Belgium). The membranes 414

were mounted on a stub and coated with 5 nm of carbon (Leica Microsystems Belgium) 415

in a Leica EM Ace 600 coater (Leica Microsystems Belgium). SEM imaging was 416

performed using a Quanta FEG250 SEM system (Thermo Fisher, Asse, Belgium) at the 417

February 5, 2019 11/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

Antwerp Centre for Advanced Microscopy (ACAM, University of Antwerp) and Electron 418

Microscopy for Material Science group (EMAT, University of Antwerp). 419

Detection of genomic clusters encoding pili or fimbriae 420

To screen for the presence of the spaCBA gene cluster, the gene cluster that is 421

responsible for expression of the fimbriae in L. rhamnosus GG [38,39], a BLAST 422

search [64,65] on protein level was performed against a BLAST database constructed for 423

each genome separately. The gene sequences of spaA (NCBI GenBank accession number 424

BAI40953.1), spaB (BAI40954.1) and spaC (BAI40955.1) were used as queries. 425

Furthermore, the genomes were screened for genes encoding pili-related protein 426

secretion systems, using the predicted amino acid sequences as query and the TXSScan 427

definitions and profile models [25] as references in MacSyFinder v1.0.5 [78]. As only 428

genes related to conjugation systems were found, all protein sequences of all genomes 429

were again scanned, this time using the CONJScan definitions and profile 430

models [23, 26] using MacSyFinder. In brief, a conjugation region was only considered if 431

the conjugation genes were separated by less than 31 genes, except for genes encoding 432

relaxases that can be separated by maximal 60 genes. The region was considered 433

conjugative when it contained genes coding for (i) a VirB4/TraU homolog, (ii) a 434

relaxase, (iii) a type 4 coupling protein (T4CP) and (iv) a minimum number of 435

mating-pair formation (MPF) type-specific genes [26]. For both scans, hits with 436

alignments covering > 50% of the protein profile and with an independent E-value ¡ 437

10−3 were kept for further analysis (default parameters) in R [71]. Conserved domain 438

analysis of genes of interest was performed using the NCBI Conserved Domain web 439

interface [79]. The gene regions were visualized using the R package gggenes (available 440

at https://github.com/wilkox/gggenes). 441

Plasmid identification 442

Detection and reconstruction of plasmids in the different L. mudanjiangensis strains 443

was performed using Recycler v0.7 [46], with the original fastq files and SPAdes 444

assembly graphs as input. The assembled plasmids were annotated with Prokka and 445

further characterized by scanning against the EggNOG database, as described in above. 446

The presence of a conjugation system was confirmed with CONJScan, as described 447

above. The percentage identity between the different plasmids found was assessed using 448

BLAST [64,65]. The similarity with any previously described plasmid was checked by 449

performing a BLAST search [64,65] against the NCBI nucleotide (nt) database. A 450

plasmid map was created using Geneious v8 [80]. 451

Delimitation of integrative and conjugative elements 452

The presence of ICEs was explored by a similar approach as the pipeline described 453

previously [26]. Briefly, all strict core genes, i.e. genes present in all strains of L. 454

mudanjiangensis were found using the Orthofinder output (see above). Next, all 455

flanking core genes of each conjugative region were identified. Since within one species 456

an ICE is expected to be found between the same core orthogroups, the flanking core 457

genes of each conjugative region found were evaluated to determine whether or not it 458

could be defined as an ICE. 459

Accession number(s) and data availability 460

Sequencing data and genome assemblies are available at the European Nucleotide 461

Archive under the accession number ERP111972. The complete pipeline can be found 462

February 5, 2019 12/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

on GitHub (https://github.com/swuyts/mudAnalysis). 463

Acknowledgments 464

We thank Eline Oerlemans, Ilke De Boeck, Ines Tuyaerts, and all other members of the 465

ENdEMIC group for their general assistance and/or fruitful discussions. In addition, we 466

also thank Charlotte Claes and Arvid Suls (Centre of Medical Genetics, University of 467

Antwerp, Antwerp, Belgium) for their valuable input regarding whole-genome 468

sequencing. Furthermore, we would like to thank the Antwerp Centre for Advanced 469

Microscopy (ACAM, University of Antwerp) for processing and imaging of the SEM 470

samples and the Electron Microscopy for Material Science group (EMAT, University of 471

Antwerp) for the use of the environmental SEM (Quanta 250 FEG). 472

References

1. Sun Z, Harris HMB, McCann A, Guo C, Argimon S, Zhang W, et al. Expandingthe biotechnology potential of lactobacilli through comparative genomics of 213strains and associated genera. Nature Communications. 2015;6(1):8322.doi:10.1038/ncomms9322.

2. Claesson MJ, van Sinderen D, O’Toole PW. Lactobacillus phylogenomics -Towards a reclassification of the genus. International Journal of Systematic andEvolutionary Microbiology. 2008;58(12):2945–2954. doi:10.1099/ijs.0.65848-0.

3. Salvetti E, Torriani S, Felis GE. The genus Lactobacillus: A taxonomic update.Probiotics and Antimicrobial Proteins. 2012;4(4):217–226.doi:10.1007/s12602-012-9117-8.

4. Salvetti E, Harris HMB, Felis GE, O’Toole PW. Comparative genomics revealsrobust phylogroups in the genus Lactobacillus as the basis for reclassification.Applied and Environmental Microbiology. 2018;84(17):00993–18.doi:10.1128/AEM.00993-18.

5. Duar RM, Lin XB, Zheng J, Martino ME, Grenier T, Perez-Munoz ME, et al.Lifestyles in transition: evolution and natural history of the genus Lactobacillus.FEMS Microbiology Reviews. 2017;41(Supp 1):S27–S48.doi:10.1093/femsre/fux030.

6. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA,et al. A standardized bacterial taxonomy based on genome phylogenysubstantially revises the tree of life. Nature Biotechnology.2018;36(August):996–1004. doi:10.1038/nbt.4229.

7. Zheng J, Ruan L, Sun M, Ganzle M. A genomic view of Lactobacilli andPediococci demonstrates that phylogeny matches ecology and physiology. Appliedand Environmental Microbiology. 2015;81(20):7233–7243.doi:10.1128/AEM.02116-15.

8. Mao Y, Chen M, Horvath P. Lactobacillus herbarum sp. nov., a species relatedto Lactobacillus plantarum. International Journal of Systematic and EvolutionaryMicrobiology. 2015;65(12):4682–4688. doi:10.1099/ijsem.0.000636.

9. Miyashita M, Yukphan P, Chaipitakchonlatarn W, Malimas T, Sugimoto M,Yoshino M, et al. Lactobacillus plajomi sp. nov. and Lactobacillus

February 5, 2019 13/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

modestisalitolerans sp. nov., isolated from traditional fermented foods.International Journal of Systematic and Evolutionary Microbiology.2015;65(8):2485–2490. doi:10.1099/ijs.0.000290.

10. Gu CT, Li CY, Yang LJ, Huo GC. Lactobacillus mudanjiangensis sp. nov.,Lactobacillus songhuajiangensis sp. nov. and Lactobacillus nenjiangensis sp. nov.,isolated from Chinese traditional pickle and sourdough. International Journal ofSystematic and Evolutionary Microbiology. 2013;63(Pt 12):4698–4706.doi:10.1099/ijs.0.054296-0.

11. Wuyts S, Van Beeck W, Oerlemans EFM, Wittouck S, Claes IJJ, De Boeck I,et al. Carrot juice fermentations as man-made microbial ecosystems dominatedby lactic acid bacteria. Applied and Environmental Microbiology.2018;(April):00134–18. doi:10.1128/AEM.00134-18.

12. Lebeer S, Verhoeven TLa, Francius G, Schoofs G, Lambrichts I, Dufrene Y, et al.Identification of a gene cluster for the biosynthesis of a long, galactose-richexopolysaccharide in Lactobacillus rhamnosus GG and functional analysis of thepriming glycosyltransferase. Applied and Environmental Microbiology.2009;75(11):3554–3563. doi:10.1128/AEM.02919-08.

13. Kankainen M, Paulin L, Tynkkynen S, von Ossowski I, Reunanen J, Partanen P,et al. Comparative genomic analysis of Lactobacillus rhamnosus GG reveals pilicontaining a human-mucus binding protein. Proceedings of the NationalAcademy of Sciences of the United States of America. 2009;106(40):17193–17198.doi:10.1073/pnas.0908876106.

14. Douillard FP, Ribbera A, Kant R, Pietila TE, Jarvinen HM, Messing M, et al.Comparative genomic and functional analysis of 100 Lactobacillus rhamnosusstrains and their comparison with strain GG. PLoS Genetics. 2013;9(8):e1003683.doi:10.1371/journal.pgen.1003683.

15. Douillard FP, Mora D, Eijlander RT, Wels M, de Vos WM. Comparative genomicanalysis of the multispecies probiotic-marketed product VSL3. PLoS ONE.2018;13(2):e0192452. doi:10.1371/journal.pone.0192452.

16. Yu X, Jaatinen A, Rintahaka J, Hynonen U, Lyytinen O, Kant R, et al. Humangut-commensalic Lactobacillus ruminis ATCC 25644 displays sortase-assembledsurface piliation: Phenotypic characterization of its fimbrial operon through insilico predictive analysis and recombinant expression in Lactococcus lactis. PLoSONE. 2015;10(12):1–31. doi:10.1371/journal.pone.0145718.

17. Kant R, Palva A, Von Ossowski I. An in silico pan-genomic probe for themolecular traits behind Lactobacillus ruminis gut autochthony. PLoS ONE.2017;12(4):1–26. doi:10.1371/journal.pone.0175541.

18. Harris HMB, Bourin MJB, Claesson MJ, O’Toole PW. Phylogenomics andcomparative genomics of Lactobacillus salivarius, a mammalian gut commensal.Microbial Genomics. 2017;3(8). doi:10.1099/mgen.0.000115.

19. Mandlik A, Swierczynski A, Das A, Ton-That H. Pili in Gram-positive bacteria:assembly, involvement in colonization and biofilm development. Trends inMicrobiology. 2008;16(1):33–40. doi:10.1016/j.tim.2007.10.010.

20. Filloux A. A variety of bacterial pili involved in horizontal gene transfer. Journalof Bacteriology. 2010;192(13):3243–3245. doi:10.1128/JB.00424-10.

February 5, 2019 14/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

21. Muschiol S, Balaban M, Normark S, Henriques-Normark B. Uptake ofextracellular DNA: Competence-induced pili in natural transformation ofStreptococcus pneumoniae. BioEssays. 2015;37(4):426–435.doi:10.1002/bies.201400125.

22. Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobilityof plasmids. Microbiology and Molecular Biology Reviews. 2010;74(3):434–452.doi:10.1128/MMBR.00020-10.

23. Guglielmini J, Quintais L, Garcillan-Barcia MP, de la Cruz F, Rocha EPC. Therepertoire of ice in prokaryotes underscores the unity, diversity, and ubiquity ofconjugation. PLoS Genetics. 2011;7(8). doi:10.1371/journal.pgen.1002222.

24. Guglielmini J, Neron B, Abby SS, Garcillan-Barcia MP, La Cruz DF, Rocha EPC.Key components of the eight classes of type IV secretion systems involved inbacterial conjugation or protein secretion. Nucleic Acids Research.2014;42(9):5715–5727. doi:10.1093/nar/gku194.

25. Abby SS, Cury J, Guglielmini J, Neron B, Touchon M, Rocha EPC.Identification of protein secretion systems in bacterial genomes. ScientificReports. 2016;6(1):23080. doi:10.1038/srep23080.

26. Cury J, Touchon M, Rocha EPC. Integrative and conjugative elements and theirhosts: composition, distribution and organization. Nucleic Acids Research.2017;45(15):8943–8956. doi:10.1093/nar/gkx607.

27. Johnson CM, Grossman AD. Integrative and conjugative elements (ICEs): whatthey do and how they work. Annual Review of Genetics. 2015;49(1):577–601.doi:10.1146/annurev-genet-112414-055018.

28. Delavat F, Miyazaki R, Carraro N, Pradervand N, van der Meer JR. The hiddenlife of integrative and conjugative elements. FEMS Microbiology Reviews.2017;41(4):512–537. doi:10.1093/femsre/fux008.

29. De Bruyne K, Camu N, De Vuyst L, Vandamme P. Lactobacillus fabifermentanssp. nov. and Lactobacillus cacaonum sp. nov., isolated from Ghanaian cocoafermentations. International Journal of Systematic and EvolutionaryMicrobiology. 2009;59(1):7–12. doi:10.1099/ijs.0.001172-0.

30. Curk MC, Hubert JC, Bringel F. Lactobacillus paraplantarum sp. nov., a newspecies related to Lactobacillus plantarum. International Journal of SystematicBacteriology. 1996;46(2):595–598. doi:10.1099/00207713-46-2-595.

31. Zanoni P, Farrow JAE, Phillips BA, Collins MD. Lactobacillus pentosus (Fred,Peterson, and Anderson) sp. nov., nom. rev. International Journal of Systematicand Evolutionary Microbiology. 1987;37(4):339–341.

32. Pederson CS. A study of the species Lactobacillus plantarum (Orla-Jensen)Bergey et al . Journal of bacteriology. 1936;31(3):217.

33. Gu CT, Wang F, Li CY, Liu F, Huo GC. Lactobacillus xiangfangensis sp. nov.,isolated from Chinese pickle. International Journal of Systematic andEvolutionary Microbiology. 2012;62(Pt 4):860–863. doi:10.1099/ijs.0.031468-0.

34. Richter M, Rossello-Mora R. Shifting the genomic gold standard for theprokaryotic species definition. Proceedings of the National Academy of Sciencesof the United States of America. 2009;106(45):19126–19131.doi:10.1073/pnas.0906412106.

February 5, 2019 15/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

35. Yennamalli RM, Rader AJ, Kenny AJ, Wolt JD, Sen TZ. Endoglucanases:insights into thermostability for biofuel applications. Biotechnology for Biofuels.2013;6(1):136. doi:10.1186/1754-6834-6-136.

36. Koeck DE, Hahnke S, Zverlov VV. Herbinix luporum sp. nov., a thermophiliccellulose-degrading bacterium isolated from a thermophilic biogas reactor.International Journal of Systematic and Evolutionary Microbiology.2016;66(10):4132–4137. doi:10.1099/ijsem.0.001324.

37. Aspeborg H, Coutinho PM, Wang Y, Brumer H, Henrissat B. Evolution,substrate specificity and subfamily classification of glycoside hydrolase family 5(GH5). BMC Evolutionary Biology. 2012;12(1):186.doi:10.1186/1471-2148-12-186.

38. Lebeer S, Claes I, Tytgat HLP, Verhoeven TLa, Marien E, von Ossowski I, et al.Functional analysis of Lactobacillus rhamnosus GG pili in relation to adhesionand immunomodulatory interactions with intestinal epithelial cells. Applied andEnvironmental Microbiology. 2012;78(1):185–193. doi:10.1128/AEM.06192-11.

39. Lebeer S, Bron PA, Marco ML, Van Pijkeren JP, O’Connell Motherway M, HillC, et al. Identification of probiotic effector molecules: present state and futureperspectives. Current Opinion in Biotechnology. 2018;49:217–223.doi:10.1016/j.copbio.2017.10.007.

40. Giltner CL, Nguyen Y, Burrows LL. Type IV pilin proteins: versatile molecularmodules. Microbiology and Molecular Biology Reviews. 2012;76(4):740–772.doi:10.1128/MMBR.00035-12.

41. Guglielmetti S, Balzaretti S, Taverniti V, Miriani M, Milani C, Scarafoni A, et al.TgaA, a VirB1-like component belonging to a putative type IV secretion systemof Bifidobacterium bifidum MIMBb75. Applied and Environmental Microbiology.2014;80(17):5161–5169. doi:10.1128/AEM.01413-14.

42. Zupan J, Hackworth CA, Aguilar J, Ward D, Zambryski P. VirB1 promotesT-pilus formation in the Vir-type IV secretion system of Agrobacteriumtumefaciens. Journal of Bacteriology. 2007;189(18):6551–6563.doi:10.1128/JB.00480-07.

43. Schroder G, Lanka E. The mating pair formation system of conjugative plasmids- A versatile secretion machinery for transfer of proteins and DNA. Plasmid.2005;54(1):1–25. doi:10.1016/j.plasmid.2005.02.001.

44. Alvarez-Martinez CE, Christie PJ. Biological diversity of prokaryotic type IVsecretion systems. Microbiology and Molecular Biology Reviews.2009;73(4):775–808. doi:10.1128/MMBR.00023-09.

45. Shariq M, Kumar N, Kumari R, Kumar A, Subbarao N, Mukhopadhyay G.Biochemical analysis of CagE: A VirB4 homologue of Helicobacter pyloriCag-T4SS. PLoS ONE. 2015;10(11):e0142606. doi:10.1371/journal.pone.0142606.

46. Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E, Mizrahi I, et al.Recycler: an algorithm for detecting plasmids from de novo assembly graphs.Bioinformatics. 2016;53(9):btw651. doi:10.1093/bioinformatics/btw651.

47. Jung JY, Lee SH, Jeon CO. Complete genome sequence of Leuconostoc carnosumstrain JB16, isolated from kimchi. Journal of Bacteriology.2012;194(23):6672–6673. doi:10.1128/JB.01805-12.

February 5, 2019 16/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

48. Anukam KC, Macklaim JM, Gloor GB, Reid G, Boekhorst J, Renckens B, et al.Genome sequence of Lactobacillus pentosus KCA1: vaginal isolate from a healthypremenopausal woman. PLoS ONE. 2013;8(3):e59239.doi:10.1371/journal.pone.0059239.

49. Qureshi A, Itankar Y, Ojha R, Mandal M, Khardenavis A, Kapley A, et al.Genome sequence of Lactobacillus plantarum EGD-AQ4, isolated from fermentedproduct of Northeast India. Genome Announcements. 2014;2(1):4–5.doi:10.1128/genomeA.01122-13.

50. Martino ME, Bayjanov JR, Caffrey BE, Wels M, Joncour P, Hughes S, et al.Nomadic lifestyle of Lactobacillus plantarum revealed by comparative genomics of54 strains isolated from different habitats. Environmental Microbiology.2016;00(2016). doi:10.1111/1462-2920.13455.

51. Abriouel H, Perez Montoro B, Casado Munoz MDC, Knapp CW, Galvez A,Benomar N. In silico genomic insights into aspects of food safety and defensemechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated frombrines of naturally fermented Alorena green table olives. PLoS ONE.2017;12(6):e0176801. doi:10.1371/journal.pone.0176801.

52. Klemm D, Heublein B, Fink HP, Bohn A. Cellulose: Fascinating biopolymer andsustainable raw material. Angewandte Chemie International Edition.2005;44(22):3358–3393. doi:10.1002/anie.200460587.

53. Sharma KD, Karki S, Thakur NS, Attri S. Chemical composition, functionalproperties and processing of carrot: a review. Journal of Food Science andTechnology. 2012;49(1):22–32. doi:10.1007/s13197-011-0310-7.

54. Fukao M, Oshima K, Morita H, Toh H, Suda W, Kim SW, et al. Genomicanalysis by deep sequencing of the probiotic Lactobacillus brevis KB290 harboringnine plasmids reveals genomic stability. PLoS ONE. 2013;8(3).doi:10.1371/journal.pone.0060521.

55. Zhang W, Yu D, Sun Z, Chen X, Bao Q, Meng H, et al. Complete nucleotidesequence of plasmid plca36 isolated from Lactobacillus casei Zhang. Plasmid.2008;60(2):131–135. doi:10.1016/j.plasmid.2008.06.003.

56. Ito Y, Kawai Y, Arakawa K, Honme Y, Sasaki T, Saito T. Conjugative plasmidfrom Lactobacillus gasseri LA39 that carries genes for production of andimmunity to the circular bacteriocin gassericin A. Applied and EnvironmentalMicrobiology. 2009;75(19):6340–6351. doi:10.1128/AEM.00195-09.

57. Tanizawa Y, Tohno M, Kaminuma E, Nakamura Y, Arita M. Complete genomesequence and analysis of Lactobacillus hokkaidonensis LOOC260T, apsychrotrophic lactic acid bacterium isolated from silage. BMC Genomics.2015;16(1):1–11. doi:10.1186/s12864-015-1435-2.

58. van Kranenburg R, Golic N, Bongers R, Leer RJ, de Vos WM, Siezen RJ, et al.Functional analysis of three plasmids from Lactobacillus plantarum. Applied andEnvironmental Microbiology. 2005;71(3):1223–1230.doi:10.1128/AEM.71.3.1223-1230.2005.

59. Kim DH, Jeon YJ, Chung MJ, Seo JG, Ro YT. Complete sequence and geneanalysis of a cryptic plasmid pLU4 in Lactobacillus reuteri strain LU4 (KCTC12397BP). Applied Biological Chemistry. 2017;60(2):145–153.doi:10.1007/s13765-017-0264-1.

February 5, 2019 17/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

60. Bankevich A, Nurk S, Antipov D, Gurevich Aa, Dvorkin M, Kulikov AS, et al.SPAdes: A new genome assembly algorithm and its applications to single-cellsequencing. Journal of Computational Biology. 2012;19(5):455–477.doi:10.1089/cmb.2012.0021.

61. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment toolfor genome assemblies. Bioinformatics. 2013;29(8):1072–1075.doi:10.1093/bioinformatics/btt086.

62. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag NewYork; 2009. Available from: http://ggplot2.org.

63. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics.2014;30(14):2068–2069. doi:10.1093/bioinformatics/btu153.

64. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignmentsearch tool. Journal of Molecular Biology. 1990;215(3):403–410.doi:10.1016/S0022-2836(05)80360-2.

65. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al.BLAST+: architecture and applications. BMC Bioinformatics. 2009;10(1):421.doi:10.1186/1471-2105-10-421.

66. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. Thecarbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research.2014;42(D1):D490–D495. doi:10.1093/nar/gkt1178.

67. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. DbCAN: A web resource forautomated carbohydrate-active enzyme annotation. Nucleic Acids Research.2012;40(W1):445–451. doi:10.1093/nar/gks479.

68. Finn RD, Clements J, Eddy SR. HMMER web server: Interactive sequencesimilarity searching. Nucleic Acids Research. 2011;39(SUPPL. 2):29–37.doi:10.1093/nar/gkr367.

69. Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, et al. dbCAN2: a metaserver for automated carbohydrate-active enzyme annotation. Nucleic AcidsResearch. 2018;46(W1):W95–W101. doi:10.1093/nar/gky418.

70. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genomecomparisons dramatically improves orthogroup inference accuracy. GenomeBiology. 2015;16(1):157. doi:10.1186/s13059-015-0721-2.

71. R Core Team. R: A language and environment for statistical computing; 2015.Available from: https://www.r-project.org/.

72. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization ofintersecting sets and their properties. Bioinformatics. 2017;33(18):2938–2940.doi:10.1093/bioinformatics/btx364.

73. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C,et al. Fast genome-wide functional annotation through orthology assignment byEggNOG-mapper. Molecular Biology and Evolution. 2017;34(8):2115–2122.doi:10.1093/molbev/msx148.

74. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis andpost-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313.doi:10.1093/bioinformatics/btu033.

February 5, 2019 18/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

75. Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. ggtree: an r package forvisualization and annotation of phylogenetic trees with their covariates and otherassociated data. Methods in Ecology and Evolution. 2017;8(1):28–36.doi:10.1111/2041-210X.12628.

76. Pritchard L, Glover RH, Humphris S, Elphinstone JG, Toth IK. Genomics andtaxonomy in diagnostics for food security: soft-rotting enterobacterial plantpathogens. Analytical Methods. 2016;8(1):12–24. doi:10.1039/C5AY02550H.

77. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, TiedjeJM. DNA-DNA hybridization values and their relationship to whole-genomesequence similarities. International Journal of Systematic and EvolutionaryMicrobiology. 2007;57(Pt 1):81–91. doi:10.1099/ijs.0.64483-0.

78. Abby SS, Neron B, Menager H, Touchon M, Rocha EPC. MacSyFinder: Aprogram to mine genomes for molecular systems with an application toCRISPR-Cas systems. PLoS ONE. 2014;9(10):e110726.doi:10.1371/journal.pone.0110726.

79. Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al.CDD/SPARCLE: functional classification of proteins via subfamily domainarchitectures. Nucleic Acids Research. 2017;45(D1):D200–D203.doi:10.1093/nar/gkw1129.

80. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al.Geneious Basic: An integrated and extendable desktop software platform for theorganization and analysis of sequence data. Bioinformatics.2012;28(12):1647–1649. doi:10.1093/bioinformatics/bts199.

Supporting information

S1 Fig. Maximum likelihood phylogenetic tree of the whole Lactobacillusplantarum group constructed using 304 genome assemblies and the aminoacid sequences of their 612 single-copy marker genes with Lactobacillusalgidus DSM 15638 as outgroup. The branch length of the outgroup was shortenedfor better visualization. The colors represent the species as annotated in the NationalCenter for Biotechnology Information (NCBI) Assembly database. Type strains of eachspecies are annotated with a triangle (NCBI) or a square (sequenced in-house).

S2 Fig. Alternative vizualisation of all pairwise average nucleotideidentity (ANI) comparisons for each Lactobacillus plantarum group species,as defined by the clades in Fig1. In green all inter-clade comparisons are shown,while orange shows all intra-clade comparisons. For Lactobacillus xiangfangensis andLactobacillus herbarum, no intra-clade comparisons could be performed, as only onegenomic assembly was available for these species.

S1 Table. Overview of all of the publicly available Lactobacillusplantarum group genomes used in this study. If available, the strain name isshown in the second column. The clade name is in accordance to the phylogenetic treeof Fig1. The last column indicates whether a conjugation system was found or not.

S2 Table. Number of genomes, core orthogroups, accessory orthogroups,average orthogroups per genome and average genes per genome of theLactobacillus plantarum group as a whole and all of its members separately.

February 5, 2019 19/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

S3 Table. An overview of all genes found in the Lactobacillusmudanjiangensis conjugation operons. The CONJscan column contains theannotation resulting from running the CONJscan tool.

February 5, 2019 20/20

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint

.CC-BY 4.0 International licenseis made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It. https://doi.org/10.1101/549451doi: bioRxiv preprint