Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Species-level microbiome composition of activated sludge - introducing the MiDAS 3 1
ecosystem-specific reference database and taxonomy 2
Marta Nierychlo, Kasper Skytte Andersen, Yijuan Xu, Nick Green, Mads Albertsen, Morten S. 3
Dueholm, Per Halkjær Nielsen. 4
Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, 5
Aalborg, Denmark. 6
*Correspondence to: Per Halkjær Nielsen, Center for Microbial Communities, Department of 7
Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark; 8
Phone: +45 9940 8503; Fax: Not available; E-mail: [email protected] 9
Abstract 10
The function of microbial communities in wastewater treatment systems and anaerobic digesters is 11
dictated by the physiological activity of its members and complex interactions between them. Since 12
functional traits are often conserved at low taxonomic ranks (genus, species, strain), the development 13
of high taxonomic resolution and reliable classification is the first crucial step towards understanding 14
the role of microbes in any ecosystem. Here we present MiDAS 3, a comprehensive 16S rRNA gene 15
reference database based on high-quality full-length sequences derived from activated sludge and 16
anaerobic digester systems. The MiDAS 3 taxonomy proposes unique provisional names for all 17
microorganisms down to species level. MiDAS 3 was applied for the detailed analysis of microbial 18
communities in 20 Danish wastewater treatment plants with nutrient removal, sampled over 12 years, 19
demonstrating community stability and many abundant core taxa. The top 50 most abundant species 20
belonged to genera, of which >50% have no known function in the system, emphasizing the need for 21
more efforts towards elucidating the role of important members of wastewater treatment ecosystems. 22
The MiDAS 3 taxonomic database guided an update of the MiDAS Field Guide – an online resource 23
linking the identity of microorganisms in wastewater treatment systems to available data related to 24
their functional importance. The new field guide contains a complete list of genera (>1,800) and 25
species (>4,200) found in activated sludge and anaerobic digesters. The identity of the microbes is 26
linked to functional information, where available. The website also provides the possibility to BLAST 27
the sequences against MiDAS 3 taxonomy directly online. The MiDAS Field Guide is a collaborative 28
platform acting as an online knowledge repository and facilitating understanding of wastewater 29
treatment ecosystem function. 30
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
2
Introduction 31
Biological nutrient removal (BNR) plants are the principal wastewater treatment systems across the 32
urbanized world. Besides their primary role in removing pollutants and pathogens, these systems are 33
increasingly seen as resource recovery facilities, contributing to sustainable resource management. 34
At-plant anaerobic digestion exploits the value of sludge as a source of energy, reducing plant carbon 35
footprint and placing wastewater treatment plants (WWTPs) on the road to achieving a circular 36
economy. Complex microbial communities define the function of biological wastewater treatment 37
systems, and understanding the role of individual microbes requires knowledge of their identity, to 38
which known and putative functions can be assigned. The gold standard for identification and 39
characterization of the diversity, composition, and dynamics of microbial communities is presently 40
16S rRNA gene amplicon sequencing (Boughner and Singh, 2016). A crucial step is a reliable 41
taxonomic classification, but this is strongly hampered by the lack of sufficient reference sequences 42
in public databases for many important microbes present in wastewater treatment systems (McIlroy 43
et al., 2015). In addition, high taxonomic resolution is missing from the large-scale public reference 44
databases (SILVA (Quast et al., 2013), Greengenes (DeSantis et al., 2006), RDP (Cole et al., 2014)), 45
often resulting in poor classification. 46
MiDAS (Microbial Database for Activated Sludge) was established in 2015 as an ecosystem-47
specific database for wastewater treatment systems that provided manually curated taxonomic 48
assignment (MiDAS taxonomy 1.0) based on the SILVA database and associated physiological 49
information profiles (midasfieldguide.org) for all abundant and process-critical genera in activated 50
sludge (AS) (McIlroy et al., 2015). Both taxonomy and database were later updated (MiDAS 2.0) to 51
cover abundant microorganisms found in anaerobic digesters (AD) and influent wastewater (McIlroy 52
et al., 2017). However, more high-identity 16S rRNA gene reference sequences are needed to be able 53
to classify microbes at the lowest possible taxonomic levels. Currently, taxonomic classification can 54
only be provided down to genus level, limiting elucidation of the true diversity in these ecosystems. 55
Consequently, physiological differences between the members of the same genus that coexist in 56
activated sludge cannot be reliably linked to the individual phylotypes due to insufficient 57
phylogenetic resolution. Finally, manual curation of large-scale databases such as SILVA is not 58
feasible due to the growing number of sequences and frequent taxonomy updates (Glöckner et al., 59
2017). 60
Recent developments of sequencing technology and bioinformatic tools enabled generation of 61
millions of high-quality full-length 16S rRNA reference sequences from any environmental 62
ecosystem (Callahan et al., 2019; Karst et al., 2019, 2018). Obtaining full-length 16S rRNA gene 63
sequences directly from environmental samples has made it possible to establish a near-complete 64
reference database for high-taxonomic resolution studies of the wastewater treatment ecosystem, 65
based on approximately one million 16S rRNA gene sequences retrieved directly from the wastewater 66
treatment plants, which provided more than 9,500 exact sequence variants (ESVs) after dereplication 67
and error-correction. Moreover, we created the AutoTax pipeline for generating a comprehensive 68
ecosystem-specific taxonomic database with de novo names for novel taxa. The names are first 69
inherited from sequences present in large-scale SILVA taxonomy, and de novo names are provided 70
for all remaining unclassified taxa based on reproducible clustering with rank-specific identity 71
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
3
thresholds (Yarza et al., 2014). These de novo names provide placeholder names for all novel 72
phylotypes and act as fixed unique identifiers, making the taxonomic assignment independent of the 73
analyzed dataset and enabling cross-study comparisons. 74
Here we present the MiDAS 3 reference database, which is based on the ESVs previously 75
obtained from activated sludge and anaerobic digester systems (Dueholm et al., 2019), amended with 76
additional sequences of potential importance in the field. These sequences represent bacteria from 77
influent wastewater, potential pathogens, and genome-derived sequences of microbes found in 78
wastewater treatment systems. The MiDAS 3 taxonomy is built using AutoTax (Dueholm et al., 2019) 79
and proposes unique provisional names for all microorganisms important in wastewater treatment 80
ecosystems at species-level resolution. We have furthermore curated some of these species-level 81
names, and here we demonstrate the resolution power of MiDAS 3 by analyzing species-level 82
community composition in 20 Danish full-scale WWTPs carrying out nutrient removal with chemical 83
phosphorus removal (BNR) or enhanced biological phosphorus removal (EBPR) over 12 years. We 84
provide important information about diversity, core communities, and the most abundant species in 85
these ecosystems. Although the plants investigated are Danish, the results and approach are relevant 86
for similar systems worldwide. 87
We also present the updated MiDAS Field Guide, an online platform that links the MiDAS 3 88
taxonomy-derived identity of all microbes from wastewater treatment systems to abundance 89
information based on our long-term survey, and functional information (where available). MiDAS 3 90
alleviates important problems related to the taxonomic classification of microbes in environmental 91
ecosystems, such as missing reference sequences and low taxonomic resolution, while our MiDAS 92
Field Guide acts as an online knowledge repository facilitating understanding the function of 93
microbes present in wastewater treatment ecosystems. 94
Materials and methods 95
MiDAS 3 reference database and taxonomy 96
The MiDAS 3 reference database was built using full-length 16S rRNA gene ESVs from 21 activated 97
sludge WWTPs and 16 ADs systems, and taxonomic classification was assigned as described by 98
Dueholm et al., (2019) and shown in Figure 1. The database was amended with 95 additional 99
sequences from published genomes for the bacteria that are potentially important in wastewater 100
treatment systems, but not present in the reference ESVs obtained (Table S2). Taxonomy was 101
assigned to all sequences using the workflow presented in Figure 1. Gaps in the SILVA-derived 102
taxonomy were filled with a de novo taxonomy that provides ‘midas_x_y’ placeholder names, where 103
x represents the first letter of taxonomic level, for which de novo assignment was created, and y is a 104
number derived from the ESV that forms the cluster centroid for a given taxon. Where applicable, 105
taxonomic classifications were curated with taxon names for ESVs that represent genomes published 106
in peer-reviewed literature. Additionally, several manual curations were made, based on MiDAS 2.1 107
database and published literature (full list of amendments is presented in Table S3). MiDAS 108
taxonomy file is available for download from the web platform (midasfieldguide.org/Downloads). 109
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
4
Activated sludge samples 110
Sampling of the activated sludge biomass from 20 Danish full-scale municipal WWTPs was carried 111
out within the MiDAS project (McIlroy et al., 2015). The plants were sampled up to four times a year 112
(February, May, August, October) from 2006 to 2018 with a total of 712 samples. All samples were 113
taken from the aeration tank and sent overnight to Aalborg University for processing. Basic 114
information about the design and operation of the WWTPs as well as number of samples collected 115
are given in Table S1. 116
Amplicon sequencing and bioinformatic analysis 117
DNA extraction, sample preparation including amplification of V1-3 region of 16S rRNA gene, and 118
amplicon sequencing were conducted as described by Stokholm-Bjerregaard et al., (2017). Forward 119
reads were processed as described by Dueholm et al., (2019) with raw reads trimmed to 250 bp, 120
quality filtered, exact amplicon sequence variants (ASVs) identified, and taxonomy assigned using 121
the MiDAS 3 reference database using the SINTAX classifier (Edgar, 2016). 122
Data analysis and visualization 123
Data was imported into R (R Core Team, 2018) using RStudio (RStudio Team, 2015), analyzed and 124
visualized using ggplot2 v.3.1.0 (Wickham, 2009) and ampvis2 v.2.4.9 (K. S. Andersen et al., 2018). 125
The dataset containing 75,017 ASVs was rarefied to 10,000 reads per sample, resulting in 57,902 126
ASVs. Simpson’s alpha-diversity index calculation and redundancy analyses were performed using 127
the ampvis2 package. Redundancy Analysis (RDA) was performed using the Hellinger 128
transformation and constrained to WWTP location. For the analysis of rank abundance and core 129
microbiome, all ASVs without taxonomic classification at the rank analyzed were removed. The 130
remaining datasets consisting of 29,493 ASVs (50.9%) and 18,452 ASVs (31.9%) with genus- and 131
species-level classification were normalized to 100%. These ASVs were represented in total by 1647 132
genera and 3651 species. The unclassified ASVs were primarily due to missing ESV reference 133
sequences for rare ASVs, and the inability of the amplicons to resolve the species-level taxonomy for 134
certain taxa. An example of the latter is genus Trichococcus, where the species T. pasteurii, T. 135
flocculiformis, and T. patagoniensis cannot be distinguished even with full-length 16S rRNA gene 136
sequences. 137
Results and discussion 138
The MiDAS reference database provides species-level classification 139
In the creation of the MiDAS 3 database, we have applied a novel approach (Figure 1), where high-140
throughput sequencing of high-quality full-length 16S rRNA genes (Karst et al., 2018) was combined 141
with a new automatic taxonomy concept, AutoTax (Dueholm et al., 2019). The database contains 142
full-length sequences from 21 Danish WWTPs carrying out nutrient removal, and 16 mesophilic and 143
thermophilic Danish ADs. The MiDAS 3 reference database was additionally supplemented with 144
high-quality full-length 16S rRNA gene sequences of microbes with known importance in the field, 145
such as bacteria present in the influent wastewater and pathogens (Table S2). 146
MiDAS 3 taxonomic assignment is based on the AutoTax framework, where sequences first 147
inherit taxonomic classification from the SILVA database (release 132, Quast et al., 2013), including 148
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
5
species names (based on type strains), if available. The remaining unclassified sequences are assigned 149
unique MiDAS names down to species level. Thus, the new release of MiDAS 3 taxonomy proposes 150
a provisional genus and species name for all microorganisms found in activated sludge and anaerobic 151
digester ecosystems, acting as stable identifiers and placeholder names for the sequences, until the 152
representative microorganisms are further characterized and assigned approved names. Additionally, 153
names derived from recently published genomes and other sources not yet incorporated in SILVA 154
taxonomy, were manually curated (see Table S3 for full list of curated names) to provide a near-155
complete database and taxonomy for microorganisms in wastewater treatment ecosystems. For a few 156
species there were inconsistencies between the genus assigned by SILVA and the type strain name 157
(see Table S4); these issues relate to the fact the SILVA taxonomy is updated based on phylogenetic 158
analyses of the available 16S rRNA gene sequences, whereas type strain names have to be approved 159
by the international taxonomy committee. The possibility to include additional sequences ensures that 160
the database can easily be updated based on the current state of the ecosystem-specific field, and will 161
drive the future releases of the MiDAS database. 162
Two examples of improved taxonomy are shown in Table 1. Acidovorax is one of the most 163
abundant genera in Danish influent wastewater (McIlroy et al., 2017), with two abundant species 164
identified using the MiDAS 3 reference database. A. defluvii is described by the correct classification 165
down to genus level in all but one of the reference databases tested (MiDAS 2, SILVA, Greengenes, 166
RDP). However, the other abundant species (midas_s_8989) was classified only at family level, with 167
genus name either missing or assigned incorrectly by the other databases. 168
Genus Ca. Amarolinea contains abundant filamentous bacteria in Danish WWTPs (Nierychlo 169
et al., 2019) and can be associated with bulking incidents (M. H. Andersen et al., 2018; Nierychlo 170
and Nielsen, 2017). All three abundant species belonging to the genus lack taxonomic classification 171
below order level in public large-scale databases, preventing identification and analysis of this 172
important member of the activated sludge system. 173
Microbiome composition of Danish activated sludge plants 174
In this study, we performed a 12-year survey of bacteria present in 20 Danish full-scale WWTPs with 175
nutrient removal that primarily treated municipal wastewater. All plants had biological nitrogen 176
removal and most also had EBPR (17 plants), while 3 plants had chemical P-removal (BNR plants). 177
The survey was carried out using 16S rRNA gene sequencing of the V1-3 region on the Illumina 178
MiSeq platform. Using the MiDAS 3 taxonomy, microbial community composition was 179
characterized at genus and species level in all plants based on 712 samples collected 2 to 4 times per 180
year for each plant from 2006 to 2018. 181
The overall microbial diversity within each plant was estimated using the Simpson index of 182
diversity and compared for all plants analyzed (Figure 2). The Simpson index places a greater weight 183
on species evenness than the richness (Kim et al., 2017), thus allowing the assessment of how evenly 184
distributed relative taxa contributions are across the samples. Since the value of the index represents 185
the probability that two randomly selected individuals will belong to different taxa, the low values of 186
Simpson index of diversity (Figure 2) indicate that the microbial communities were dominated by a 187
low number of abundant taxa across the plants. Additionally, the low spread of the index in each plant 188
indicates that the microbial communities across the plants were consistently stable across the years. 189
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
6
Similar alpha diversity indices were observed in WWTPs across the world by Wu et al., (2019), 190
calculated as Inverse Simpson (Figure S1). 191
The overall variation in activated sludge community structure in 20 Danish WWTPs was 192
visualized using RDA plot (Figure 3) with the WWTP location as the constrained variable. In many 193
cases the samples coming from different WWTPs were not completely separated from each other, 194
indicating high similarity between the microbial communities in those WWTPs. This is also 195
highlighted by the fact that the eigenvalues of the two principal components plotted relative to the 196
total variance in the data are relatively small (9.7% and 5.6%). Fredericia, Hirtshals, Esbjerg E, and 197
Esbjerg W WWTPs appeared to have a more distinct microbial community composition, presumably 198
due to a high fraction of industrial waste in the influent (Table S1). A pronounced clustering of the 199
samples coming from individual WWTPs indicates that variation between the plants was likely more 200
pronounced than within individual plants over time, and suggests long-term stability of the microbial 201
communities in Danish plants. 202
The microbial diversity of the WWTPs was composed of 1,647 genera and 3,651 species. 203
However, a relatively small fraction of these comprised a large proportion of the reads, thus 204
representing the abundant taxa that are assumed to carry out the main functions in the system (Figure 205
4a) (Saunders et al., 2016). The abundant taxa were defined as those that constitute the top 80% of 206
reads obtained by amplicon sequencing. In the BNR and EBPR plants, we found 157 and 164 207
abundant genera and 421 and 504 abundant species, respectively. These species had a relative ASV 208
read abundance above approx. 0.04%. 209
The overlap of taxa between the BNR and EBPR abundant communities both at genus and 210
species level was high (Figure 4b), and the core taxa, defined as being present and abundant in ≥80% 211
of the samples (Saunders et al., 2016), constituted more than 45% of the total reads at genus level and 212
more than 25% of total reads at species level in both types of plants (Figure 4c). The number of 213
genera and species belonging to the core communities in BNR and EBPR plants constituted a small 214
fraction of the total number of taxa present (1-6-3.5%). Both types of plants shared 72% of genera 215
and 65% of species in their respective core communities. The results show that the inclusion of an 216
anaerobic process step in the EBPR plants compared to the conventional BNR plants did not affect 217
the communities very much. 218
Species-level community composition 219
The most abundant species present in the BNR and EBPR plants are shown in Figure 5. Among the 220
10 most abundant species, two belong to genera with a provisional ‘midas’ name, while of the top 50 221
and top 100 species, 13 and 36 species, respectively, belong to genera not possessing genus-level 222
classification in SILVA (top 100 species are presented in Figure S2). These would be excluded from 223
the analysis if MiDAS 3 was not used as the reference database, demonstrating the advantage of using 224
the ecosystem-specific taxonomy. 225
A large fraction of the most abundant species is poorly described in the literature. The top 50 226
species belong to 41 different genera, of which almost 60% (28 genera) have no known function in 227
wastewater treatment systems (McIlroy et al., 2017, 2015). The function of 65% of all genera defined 228
as abundant in the EBPR and BNR plants (Figure 2a) is unknown, demonstrating the need for more 229
studies that link identity and function. A few species, however, match SILVA type strains, such as 230
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
7
Acidovorax defluvii and Intestinibacter bartletti, so these may be applied for deeper studies of their 231
physiology. 232
The most abundant species in both BNR and EBPR plants belong to the genus Tetrasphaera, 233
which represents polyphosphate-accumulating organisms (PAO) with versatile metabolism, 234
including the potential for denitrification (Herbst et al., 2019; Kristiansen et al., 2013). Other very 235
abundant species belong to the filamentous genera Ca. Microthrix (McIlroy et al., 2013) and Ca. 236
Villigracilis (Nierychlo et al., 2019); the former known to cause serious bulking problems (Rossetti 237
et al., 2005). Trichococcus may also possess filamentous morphology in activated sludge and may be 238
implicated in bulking (Liu and Seviour, 2001). Rhodoferax is a denitrifier (McIlroy et al., 2014), 239
while the function in activated sludge is unknown for species in the Romboutsia and Rhodobacter 240
genera. The novel genus midas_g_17, for which no functional information is available, belongs to 241
the family Saprospiraceae, and was wrongly classified as Ca. Epiflobacter in the previous version of 242
MiDAS taxonomy. Based on the coverage of the FISH probes available and mapping of the partial 243
16S sequences (Xia et al., 2008) to MiDAS 3 database, Ca. Epiflobacter name was re-assigned to a 244
correct genus, which is present at much lower abundance in the ecosystem. MiDAS 3 provides good 245
opportunities to reclassify sequences from previous studies, for which information about distribution 246
and/or function is available. For example, after mapping a set of partial 16S rRNA gene sequences 247
originally related to genus Aquaspirillum (Thomsen et al., 2007), they could be reclassified as 248
midas_g_33 – an abundant novel genus, indicating its potential role in denitrification. 249
Interestingly, the diversity of abundant species within the different genera was generally low in 250
the plants, typical with 3-4 abundant species in each of the most abundant genera. However, the 251
known PAO Tetrasphaera and the less abundant Ca. Accumulibacter exhibited slightly higher 252
diversity, each having 10 and 5 abundant species, respectively (Figure 6). Tetrasphaera had one 253
dominant species (midas_s_5) present in most plants, while the other species appeared more 254
randomly distributed. No clear dominance of any of Ca. Accumulibacter species could be observed 255
as their abundance was very different across the plants. Two of these, Ca. A. phosphatis and Ca. A. 256
aalborgensis are described at the species-level, based on the genomes available (Albertsen et al., 2016; 257
Martín et al., 2006). Interestingly, the genus Dechloromonas, which is also suggested to be a putative 258
PAO (Stokholm-Bjerregaard et al., 2017), was more abundant than Ca. Accumulibacter, and their 259
diversity was low with only two dominant species present in almost all plants. In general, high species 260
diversity of the PAOs may suggest a high degree of specialization allowing the exploitation of 261
different niches in the activated sludge ecosystem. Ca. Accumulimonas, another putative PAO, which 262
was described in the previous version of MiDAS, is not included in the present taxonomy version, as 263
sequences representing that genus, according to the present SILVA taxonomy, were classified to the 264
genus Halomonas. 265
The MiDAS reference database provides a common language for the field 266
Only a few studies have made surveys of the microbial communities in activated sludge systems, and 267
in all cases, it has only been with genus-level and not species-level classification (Saunders et al., 268
2016; Wu et al., 2019). Since the different studies have applied different extraction procedures, 269
different primers (typically V1-3 or V4), and different databases for taxonomic assignment (MiDAS 270
2, SILVA, RDP or Greengenes), it can be difficult to compare results across studies at the genus level, 271
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
8
and impossible at lower taxonomic levels. Therefore, the use of the ecosystem-specific MiDAS 272
database will provide a common language for all work in the field. Another key advantage of the 273
MiDAS 3 taxonomy is that it includes taxa abundant in similar systems across the world (Dueholm 274
et al., 2019), which are missing in SILVA, RDP, and Greengenes. By using this approach, and similar 275
protocols for extraction and primer selection (Albertsen et al., 2015; Dueholm et al., 2019), it will be 276
possible to compare the results of WWTP microbiome studies in the future. 277
The MiDAS reference database for anaerobic digesters 278
Anaerobic digestion is a key technology increasingly used worldwide to reduce waste, generate 279
energy, and minimize the carbon footprint of the plant. In order to provide a comprehensive reference 280
database for the microbes present in the whole wastewater treatment system, MiDAS 3 is based on 281
sequences from activated sludge as well as 16 full-scale Danish anaerobic digesters treating primary 282
and waste activated sludge. A detailed analysis of bacterial and archaeal community structure at 283
species-level has been performed in another study (Jiang et al., n.d.). The study was based on >1000 284
samples collected over 6 years from 46 anaerobic digesters, located at 22 WWTPs in Denmark. When 285
the microbial data was coupled with the analysis of operational, physicochemical, and performance 286
parameters, the main factors driving the assembly of the AD community were elucidated. 287
MiDAS Field Guide online platform (midasfieldguide.org) 288
The new MiDAS Field Guide (www.midasfieldguide.org) is a comprehensive database of microbes 289
in wastewater treatment systems, which was updated at the same time as the release of the MiDAS 3 290
database took place. All microbes found in activated sludge and anaerobic digesters are now included 291
in the field guide and, for the first time, species are listed for each genus. Our database now covers 292
more than 1,800 genera and 4,200 species found in biological nutrient removal treatment plants and 293
anaerobic digesters. More detailed abundance information is now provided both at genus and species 294
level, and functional information is described where available. 295
The MiDAS Field Guide integrates the identity of the microbes with the available functional 296
information (Figure 7) into a community knowledge platform in the form of a searchable database 297
of referenced information, thus acting as a central, on-line repository for current knowledge, where 298
all are welcome to contribute. Due to the rapidly growing amount of relevant literature, and high 299
number of taxa included, the MiDAS Field Guide is a continuous work in progress, with our efforts 300
prioritized towards the microorganisms that comprise the core microbiome as well as other abundant 301
or process-critical species. 302
New features launched in the updated MiDAS Field Guide include: 1) phylogenetic Search by 303
taxonomy function, where all microbes found in the ecosystem are displayed in their respective 304
taxonomic groups, 2) the possibility of classifying sequences using blast function, and MiDAS 3 305
taxonomy directly online. Updated protocols for DNA extraction and amplicon library preparation 306
are available online to facilitate establishment of common framework for studies of microbial ecology 307
of wastewater treatment systems. 308
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
9
Conclusions 309
Here we present the new ecosystem-specific MiDAS 3 reference database, based on a comprehensive 310
set of full-length 16S rRNA gene sequences derived from activated sludge and anaerobic digester 311
systems, and MiDAS 3 taxonomy built using AutoTax. It proposes unique provisional names for all 312
microorganisms important in wastewater treatment ecosystems, down to species level. MiDAS 3 was 313
used in the detailed analysis of microbial communities in 20 Danish wastewater treatment plants 314
(WWTPs) with nutrient removal, sampled over 12 years. This enabled unprecedented resolution, 315
revealing for the first time all species present in the plants, the stability of the communities, many 316
abundant core taxa, and the diversity of species in important functional guilds, exemplified by the 317
PAOs. We found many genera without known function, emphasizing the need for more efforts 318
towards elucidating the role of important members of wastewater treatment ecosystems. We also 319
present a new version of the MiDAS Field Guide, an online resource with a complete list of genera 320
and species found in activated sludge and anaerobic digesters. Through the field guide, microbe 321
identity is linked to available knowledge on function, and user microbial 16S rRNA gene sequences 322
can be classified online with the MiDAS 3 taxonomy. The central purpose of the MiDAS Field Guide 323
is to facilitate collaborative research into wastewater treatment ecosystem functions, which will 324
support efforts towards the sustainable production of clean water and bioenergy, and the development 325
of resource recovery, ending ultimately in the circular economy. 326
Acknowledgement 327
The project has been funded by the Danish Research Council (grant 6111-00617A), Innovation Fund 328
Denmark (OnlineDNA, grant 6155-00003A), the Villum foundation (Dark Matter and grant 13351), 329
and Danish wastewater treatment plants. 330
References 331
Albertsen, M., Karst, S.M., Ziegler, A.S., Kirkegaard, R.H., Nielsen, P.H., 2015. Back to Basics – 332
The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated 333
Sludge Communities. PLoS ONE 10, e0132783. 334
https://doi.org/10.1371/journal.pone.0132783 335
Albertsen, M., McIlroy, S.J., Stokholm-Bjerregaard, M., Karst, S.M., Nielsen, P.H., 2016. 336
“Candidatus Propionivibrio aalborgensis”: A Novel Glycogen Accumulating Organism 337
Abundant in Full-Scale Enhanced Biological Phosphorus Removal Plants. Front. Microbiol. 338
7. https://doi.org/10.3389/fmicb.2016.01033 339
Andersen, K.S., Kirkegaard, R.H., Karst, S.M., Albertsen, M., 2018. ampvis2: an R package to 340
analyse and visualise 16S rRNA amplicon data. bioRxiv 299537. 341
https://doi.org/10.1101/299537 342
Andersen, M.H., McIlroy, S.J., Nierychlo, M., Nielsen, P.H., Albertsen, M., 2018. Genomic 343
insights into Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with 344
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
10
settleability problems in wastewater treatment plants. Systematic and Applied Microbiology. 345
https://doi.org/10.1016/j.syapm.2018.08.001 346
Boughner, L.A., Singh, P., 2016. Microbial Ecology: Where are we now? Postdoc J 4, 3–17. 347
https://doi.org/10.14304/SURYA.JPR.V4N11.2 348
Callahan, B.J., Wong, J., Heiner, C., Oh, S., Theriot, C.M., Gulati, A.S., McGill, S.K., Dougherty, 349
M.K., 2019. High-throughput amplicon sequencing of the full-length 16S rRNA gene with 350
single-nucleotide resolution. Nucleic Acids Res 47, e103–e103. 351
https://doi.org/10.1093/nar/gkz569 352
Cole, J.R., Wang, Q., Fish, J.A., Chai, B., McGarrell, D.M., Sun, Y., Brown, C.T., Porras-Alfaro, 353
A., Kuske, C.R., Tiedje, J.M., 2014. Ribosomal Database Project: data and tools for high 354
throughput rRNA analysis. Nucleic Acids Res 42, D633–D642. 355
https://doi.org/10.1093/nar/gkt1244 356
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, 357
D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA gene database 358
and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072. 359
https://doi.org/10.1128/AEM.03006-05 360
Dueholm, M.S., Andersen, K.S., Petriglieri, F., McIlroy, S.J., Nierychlo, M., Petersen, J.F., 361
Kristensen, J.M., Yashiro, E., Karst, S.M., Albertsen, M., Nielsen, P.H., 2019. 362
Comprehensive ecosystem-specific 16S rRNA gene databases with automated taxonomy 363
assignment (AutoTax) provide species-level resolution in microbial ecology. bioRxiv. 364
Edgar, R.C., 2016. SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS 365
sequences. bioRxiv 074161. https://doi.org/10.1101/074161 366
Glöckner, F.O., Yilmaz, P., Quast, C., Gerken, J., Beccati, A., Ciuprina, A., Bruns, G., Yarza, P., 367
Peplies, J., Westram, R., Ludwig, W., 2017. 25 years of serving the community with 368
ribosomal RNA gene reference databases and tools. Journal of Biotechnology, 369
Bioinformatics Solutions for Big Data Analysis in Life Sciences presented by the German 370
Network for Bioinformatics Infrastructure 261, 169–176. 371
https://doi.org/10.1016/j.jbiotec.2017.06.1198 372
Herbst, F.-A., Dueholm, M.S., Wimmer, R., Nielsen, P.H., 2019. The Proteome of Tetrasphaera 373
elongata is adapted to Changing Conditions in Wastewater Treatment Plants. Proteomes 7. 374
https://doi.org/10.3390/proteomes7020016 375
Jiang, C., Andersen, M.H., Peces, M., Nierychlo, M., Yashiro, E., Andersen, K.S., Kirkegaard, 376
R.H., Kucheryavskiy, S., Hao, L., Høgh, J., Hansen, A.A., Dueholm, M.S., Nielsen, P.H., 377
n.d. Associations between species-level microbial communities and key variables revealed 378
in anaerobic digesters at 22 Danish wastewater treatment plants during a six-years survey. 379
BioRxiv. 380
Karst, S.M., Dueholm, M.S., McIlroy, S.J., Kirkegaard, R.H., Nielsen, P.H., Albertsen, M., 2018. 381
Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences 382
without primer bias. Nature Biotechnology. https://doi.org/10.1038/nbt.4045 383
Karst, S.M., Ziels, R.M., Kirkegaard, R.H., Albertsen, M., 2019. Enabling high-accuracy long-read 384
amplicon sequences using unique molecular identifiers and Nanopore sequencing. bioRxiv 385
645903. https://doi.org/10.1101/645903 386
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
11
Kim, B.-R., Shin, J., Guevarra, R.B., Lee, J.H., Kim, D.W., Seol, K.-H., Lee, J.-H., Kim, H.B., 387
Isaacson, R.E., 2017. Deciphering Diversity Indices for a Better Understanding of Microbial 388
Communities. Journal of Microbiology and Biotechnology 27, 2089–2093. 389
https://doi.org/10.4014/jmb.1709.09027 390
Kristiansen, R., Nguyen, H.T.T., Saunders, A.M., Nielsen, J.L., Wimmer, R., Le, V.Q., McIlroy, 391
S.J., Petrovski, S., Seviour, R.J., Calteau, A., Nielsen, K.L., Nielsen, P.H., 2013. A 392
metabolic model for members of the genus Tetrasphaera involved in enhanced biological 393
phosphorus removal. ISME J 7, 543–554. https://doi.org/10.1038/ismej.2012.136 394
Liu, J.R., Seviour, R.J., 2001. Design and application of oligonucleotide probes for fluorescent in 395
situ identification of the filamentous bacterial morphotype Nostocoida limicola in activated 396
sludge. Environmental Microbiology 3, 551–560. https://doi.org/10.1046/j.1462-397
2920.2001.00229.x 398
Martín, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C., Yeates, C., He, 399
S., Salamov, A.A., Szeto, E., Dalin, E., Putnam, N.H., Shapiro, H.J., Pangilinan, J.L., 400
Rigoutsos, I., Kyrpides, N.C., Blackall, L.L., McMahon, K.D., Hugenholtz, P., 2006. 401
Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge 402
communities. Nat Biotechnol 24, 1263–1269. https://doi.org/10.1038/nbt1247 403
McIlroy, S.J., Kirkegaard, R.H., McIlroy, B., Nierychlo, M., Kristensen, J.M., Karst, S.M., 404
Albertsen, M., Nielsen, P.H., 2017. MiDAS 2.0: an ecosystem-specific taxonomy and online 405
database for the organisms of wastewater treatment systems expanded for anaerobic digester 406
groups. Database 2017. https://doi.org/10.1093/database/bax016 407
McIlroy, S.J., Kristiansen, R., Albertsen, M., Michael Karst, S., Rossetti, S., Lund Nielsen, J., 408
Tandoi, V., James Seviour, R., Nielsen, P.H., 2013. Metabolic model for the filamentous 409
‘Candidatus Microthrix parvicella’’ based on genomic and metagenomic analyses.’ ISME J 410
7, 1161–1172. https://doi.org/10.1038/ismej.2013.6 411
McIlroy, S.J., Saunders, A.M., Albertsen, M., Nierychlo, M., McIlroy, B., Hansen, A.A., Karst, 412
S.M., Nielsen, J.L., Nielsen, P.H., 2015. MiDAS: the field guide to the microbes of 413
activated sludge. Database 2015, bav062. https://doi.org/10.1093/database/bav062 414
McIlroy, S.J., Starnawska, A., Starnawski, P., Saunders, A.M., Nierychlo, M., Nielsen, P.H., 415
Nielsen, J.L., 2014. Identification of active denitrifiers in full-scale nutrient removal 416
wastewater treatment systems. Environ Microbiol n/a-n/a. https://doi.org/10.1111/1462-417
2920.12614 418
Nierychlo, M., Miłobędzka, A., Petriglieri, F., McIlroy, B., Nielsen, P.H., McIlroy, S.J., 2019. The 419
morphology and metabolic potential of the Chloroflexi in full-scale activated sludge 420
wastewater treatment plants. FEMS Microbiol Ecol 95. 421
https://doi.org/10.1093/femsec/fiy228 422
Nierychlo, M., Nielsen, P.H., 2017. Experiences in various countries: Denmark, in: Tandoi, V., 423
Rossetti, S., Wanner, J. (Eds.), Activated Sludge Separation Problems: Theory, Control 424
Measures, Practical Experiences. IWA Publishing, pp. 198–209. 425
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F.O., 426
2013. The SILVA ribosomal RNA gene database project: improved data processing and 427
web-based tools. Nucl. Acids Res. 41, D590–D596. https://doi.org/10.1093/nar/gks1219 428
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
12
R Core Team, 2018. R: A language and environment for statistical computing. R Foundation for 429
Statistical Computing, Vienna, Austria. Available online at https://www.R-project.org/. 430
Rossetti, S., Tomei, M.C., Nielsen, P.H., Tandoi, V., 2005. “Microthrix parvicella”, a filamentous 431
bacterium causing bulking and foaming in activated sludge systems: a review of current 432
knowledge. FEMS Microbiol. Rev. 29, 49–64. https://doi.org/10.1016/j.femsre.2004.09.005 433
RStudio Team, 2015. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL 434
http://www.rstudio.com/. 435
Saunders, A.M., Albertsen, M., Vollertsen, J., Nielsen, P.H., 2016. The activated sludge ecosystem 436
contains a core community of abundant organisms. The ISME journal 10, 11–20. 437
Stokholm-Bjerregaard, M., McIlroy, S.J., Nierychlo, M., Karst, S.M., Albertsen, M., Nielsen, P.H., 438
2017. A Critical Assessment of the Microorganisms Proposed to be Important to Enhanced 439
Biological Phosphorus Removal in Full-Scale Wastewater Treatment Systems. Front. 440
Microbiol. 8. https://doi.org/10.3389/fmicb.2017.00718 441
Thomsen, T.R., Kong, Y., Nielsen, P.H., 2007. Ecophysiology of abundant denitrifying bacteria in 442
activated sludge. FEMS Microbiology Ecology 60, 370–382. https://doi.org/10.1111/j.1574-443
6941.2007.00309.x 444
Wickham, H., 2009. ggplot2 - Elegant Graphics for Data Analysis. Springer Science + Business 445
Media. 446
Wu, L., Ning, D., Zhang, B., Li, Y., Zhang, P., Shan, X., Zhang, Q., Brown, M., Li, Z., Nostrand, 447
J.D.V., Ling, F., Xiao, N., Zhang, Y., Vierheilig, J., Wells, G.F., Yang, Y., Deng, Y., Tu, Q., 448
Wang, A., Zhang, T., He, Z., Keller, J., Nielsen, P.H., Alvarez, P.J.J., Criddle, C.S., 449
Wagner, M., Tiedje, J.M., He, Q., Curtis, T.P., Stahl, D.A., Alvarez-Cohen, L., Rittmann, 450
B.E., Wen, X., Zhou, J., 2019. Global diversity and biogeography of bacterial communities 451
in wastewater treatment plants. Nature Microbiology 1. https://doi.org/10.1038/s41564-019-452
0426-5 453
Xia, Y., Kong, Y., Nielsen, P.H., 2008. In situ detection of starch-hydrolyzing microorganisms in 454
activated sludge. FEMS Microbiology Ecology 66, 462–471. https://doi.org/10.1111/j.1574-455
6941.2008.00559.x 456
Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., 457
Euzéby, J., Amann, R., Rosselló-Móra, R., 2014. Uniting the classification of cultured and 458
uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Micro 12, 635–459
645. https://doi.org/10.1038/nrmicro3330 460
461
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
13
Tables: 462
Table 1. Classification of abundant species belonging to genus Acidovorax and Ca. Amarolinea with 463
different taxonomies. 464
Taxonomy Phylum Class Order Family Genus Species
MiDAS 3 Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax A. defluvii
MiDAS 2.1 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax -
SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax -
Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -
RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax -
MiDAS 3 Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax midas_s_8989
MiDAS 2.1 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -
SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae - -
Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -
RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Simplicispira -
MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea midas_s_1
MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -
SILVA Chloroflexi Anaerolineae C10_SB1A - - -
Greengenes Chloroflexi Anaerolineae SHA-20 - - -
RDP - - - - - -
MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea midas_s_553
MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -
SILVA Chloroflexi Anaerolineae C10_SB1A - - -
Greengenes Chloroflexi Anaerolineae SHA-20 - - -
RDP - - - - - -
MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea Ca. A. aalborgensis
MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -
SILVA Chloroflexi Anaerolineae C10_SB1A - - -
Greengenes Chloroflexi Anaerolineae SHA-20 - - -
RDP - - - - - -
MiDAS 2.1 (McIlroy et al., 2017); SILVA: Release 132_SSURef_NR99 (Quast et al., 2013); GreenGenes: Release 16s_13.5 (DeSantis 465 et al., 2006); Ribosomal Database Project: Release 16S_v16 training set (Cole et al., 2014). 466
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
14
Figures: 467
468 Figure 1. MiDAS 3 taxonomic assignment workflow based on the AutoTax framework. (1) ESVs 469
from the study of Danish plants (Dueholm et al., 2019), amended with AddOns ESVs coming from, 470
e.g., published genomes, are first mapped to the SILVA_132_SSURef_Nr99 database to identify the 471
closest relative. (2) Taxonomy is adopted from this sequence after trimming, based on identity and 472
the taxonomic thresholds proposed by Yarza et al., (2014) with 94.5% sequence similarity for genus-473
level. To gain species information, ESVs are mapped to sequences from type strains extracted from 474
the SILVA database, and species names were adopted if the identity was >98.7% and the type strain 475
genus matched that of the closest reference in the complete database. (3) ESVs were also clustered 476
by greedy clustering at different identities, corresponding to the thresholds proposed by Yarza et al., 477
(2014) to generate a stable de novo taxonomy. (4) The de novo taxonomy was curated with taxon 478
names for ESVs that represent, e.g., genomes from strains published in peer-reviewed literature. (5) 479
Finally, a comprehensive taxonomy was obtained by filling gaps in the SILVA-based taxonomy with 480
the de novo taxonomy with provisional “midas” placeholder names (adapted from Dueholm et al., 481
2019). 482
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
15
483 Figure 2. Alpha diversity estimates of activated sludge community composition using ASV-level 484
taxa in individual plants, calculated using the Simpson index of diversity (1-D). Data represents 712 485
samples from 20 Danish full-scale WWTPs collected from 2006 to 2018 (with 17-51 samples per 486
plant). 487
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
16
488 Figure 3. Redundancy analysis (RDA) plots of activated sludge communities, constrained to 489
visualize the variance among 712 samples explained by WWTP location. Axes show the percent of 490
total variance described by each component (first and second value represent relative contribution of 491
each axis to the total variance in the data and the constrained space only, respectively). Data represents 492
samples from 20 Danish full-scale WWTPs collected from 2006 to 2018. 493
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
17
494 Figure 4 a-c. a) Cumulative read abundance of genus- and species-level taxa found in activated 495
sludge, plotted in rank order. Dashed line denotes 80% of the reads, which is assumed to cover the 496
abundant taxa, b) Venn-diagram showing abundant (top 80% of the reads) genera and species shared 497
between BNR and EBPR plants, c) core community (observation frequency vs abundance frequency) 498
in Danish BNR and EBPR plants for genus- and species-level taxa. Abundance cut-off values 499
represent the lowest abundance of the taxon belonging to the top 80% of the reads in given group 500
(genus/species). Darker points indicate multiple observations at given positions. Colored points 501
designate core taxa. Data represents 111 samples from BNR plants and 601 samples from EBPR 502
plants. 503
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
18
504 Figure 5. Boxplot showing the occurrence of the top 50 most abundant species in Danish BNR and 505
EBPR WWTPs. 506
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
19
507 Figure 6. Occurrence of abundant PAO species in 20 full-scale activated sludge WWTPs. Data
represents average read abundance values for each plant, based on samples collected 2–4 times per
year from 2006 to 2018. Individual PAO genera are indicated by different colors.
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint
20
Figure 7. MiDAS Field Guide example of Tetrasphaera genus entry.
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint