20
1 Species-level microbiome composition of activated sludge - introducing the MiDAS 3 1 ecosystem-specific reference database and taxonomy 2 Marta Nierychlo, Kasper Skytte Andersen, Yijuan Xu, Nick Green, Mads Albertsen, Morten S. 3 Dueholm, Per Halkjær Nielsen. 4 Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, 5 Aalborg, Denmark. 6 *Correspondence to: Per Halkjær Nielsen, Center for Microbial Communities, Department of 7 Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark; 8 Phone: +45 9940 8503; Fax: Not available; E-mail: [email protected] 9 Abstract 10 The function of microbial communities in wastewater treatment systems and anaerobic digesters is 11 dictated by the physiological activity of its members and complex interactions between them. Since 12 functional traits are often conserved at low taxonomic ranks (genus, species, strain), the development 13 of high taxonomic resolution and reliable classification is the first crucial step towards understanding 14 the role of microbes in any ecosystem. Here we present MiDAS 3, a comprehensive 16S rRNA gene 15 reference database based on high-quality full-length sequences derived from activated sludge and 16 anaerobic digester systems. The MiDAS 3 taxonomy proposes unique provisional names for all 17 microorganisms down to species level. MiDAS 3 was applied for the detailed analysis of microbial 18 communities in 20 Danish wastewater treatment plants with nutrient removal, sampled over 12 years, 19 demonstrating community stability and many abundant core taxa. The top 50 most abundant species 20 belonged to genera, of which >50% have no known function in the system, emphasizing the need for 21 more efforts towards elucidating the role of important members of wastewater treatment ecosystems. 22 The MiDAS 3 taxonomic database guided an update of the MiDAS Field Guide an online resource 23 linking the identity of microorganisms in wastewater treatment systems to available data related to 24 their functional importance. The new field guide contains a complete list of genera (>1,800) and 25 species (>4,200) found in activated sludge and anaerobic digesters. The identity of the microbes is 26 linked to functional information, where available. The website also provides the possibility to BLAST 27 the sequences against MiDAS 3 taxonomy directly online. The MiDAS Field Guide is a collaborative 28 platform acting as an online knowledge repository and facilitating understanding of wastewater 29 treatment ecosystem function. 30 . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/842393 doi: bioRxiv preprint

Species-level microbiome composition of activated sludge - … · 18 microorganisms down to species level. MiDAS 3 was applied for the detailed analysis of microbial 19 communities

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

1

Species-level microbiome composition of activated sludge - introducing the MiDAS 3 1

ecosystem-specific reference database and taxonomy 2

Marta Nierychlo, Kasper Skytte Andersen, Yijuan Xu, Nick Green, Mads Albertsen, Morten S. 3

Dueholm, Per Halkjær Nielsen. 4

Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, 5

Aalborg, Denmark. 6

*Correspondence to: Per Halkjær Nielsen, Center for Microbial Communities, Department of 7

Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark; 8

Phone: +45 9940 8503; Fax: Not available; E-mail: [email protected] 9

Abstract 10

The function of microbial communities in wastewater treatment systems and anaerobic digesters is 11

dictated by the physiological activity of its members and complex interactions between them. Since 12

functional traits are often conserved at low taxonomic ranks (genus, species, strain), the development 13

of high taxonomic resolution and reliable classification is the first crucial step towards understanding 14

the role of microbes in any ecosystem. Here we present MiDAS 3, a comprehensive 16S rRNA gene 15

reference database based on high-quality full-length sequences derived from activated sludge and 16

anaerobic digester systems. The MiDAS 3 taxonomy proposes unique provisional names for all 17

microorganisms down to species level. MiDAS 3 was applied for the detailed analysis of microbial 18

communities in 20 Danish wastewater treatment plants with nutrient removal, sampled over 12 years, 19

demonstrating community stability and many abundant core taxa. The top 50 most abundant species 20

belonged to genera, of which >50% have no known function in the system, emphasizing the need for 21

more efforts towards elucidating the role of important members of wastewater treatment ecosystems. 22

The MiDAS 3 taxonomic database guided an update of the MiDAS Field Guide – an online resource 23

linking the identity of microorganisms in wastewater treatment systems to available data related to 24

their functional importance. The new field guide contains a complete list of genera (>1,800) and 25

species (>4,200) found in activated sludge and anaerobic digesters. The identity of the microbes is 26

linked to functional information, where available. The website also provides the possibility to BLAST 27

the sequences against MiDAS 3 taxonomy directly online. The MiDAS Field Guide is a collaborative 28

platform acting as an online knowledge repository and facilitating understanding of wastewater 29

treatment ecosystem function. 30

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

2

Introduction 31

Biological nutrient removal (BNR) plants are the principal wastewater treatment systems across the 32

urbanized world. Besides their primary role in removing pollutants and pathogens, these systems are 33

increasingly seen as resource recovery facilities, contributing to sustainable resource management. 34

At-plant anaerobic digestion exploits the value of sludge as a source of energy, reducing plant carbon 35

footprint and placing wastewater treatment plants (WWTPs) on the road to achieving a circular 36

economy. Complex microbial communities define the function of biological wastewater treatment 37

systems, and understanding the role of individual microbes requires knowledge of their identity, to 38

which known and putative functions can be assigned. The gold standard for identification and 39

characterization of the diversity, composition, and dynamics of microbial communities is presently 40

16S rRNA gene amplicon sequencing (Boughner and Singh, 2016). A crucial step is a reliable 41

taxonomic classification, but this is strongly hampered by the lack of sufficient reference sequences 42

in public databases for many important microbes present in wastewater treatment systems (McIlroy 43

et al., 2015). In addition, high taxonomic resolution is missing from the large-scale public reference 44

databases (SILVA (Quast et al., 2013), Greengenes (DeSantis et al., 2006), RDP (Cole et al., 2014)), 45

often resulting in poor classification. 46

MiDAS (Microbial Database for Activated Sludge) was established in 2015 as an ecosystem-47

specific database for wastewater treatment systems that provided manually curated taxonomic 48

assignment (MiDAS taxonomy 1.0) based on the SILVA database and associated physiological 49

information profiles (midasfieldguide.org) for all abundant and process-critical genera in activated 50

sludge (AS) (McIlroy et al., 2015). Both taxonomy and database were later updated (MiDAS 2.0) to 51

cover abundant microorganisms found in anaerobic digesters (AD) and influent wastewater (McIlroy 52

et al., 2017). However, more high-identity 16S rRNA gene reference sequences are needed to be able 53

to classify microbes at the lowest possible taxonomic levels. Currently, taxonomic classification can 54

only be provided down to genus level, limiting elucidation of the true diversity in these ecosystems. 55

Consequently, physiological differences between the members of the same genus that coexist in 56

activated sludge cannot be reliably linked to the individual phylotypes due to insufficient 57

phylogenetic resolution. Finally, manual curation of large-scale databases such as SILVA is not 58

feasible due to the growing number of sequences and frequent taxonomy updates (Glöckner et al., 59

2017). 60

Recent developments of sequencing technology and bioinformatic tools enabled generation of 61

millions of high-quality full-length 16S rRNA reference sequences from any environmental 62

ecosystem (Callahan et al., 2019; Karst et al., 2019, 2018). Obtaining full-length 16S rRNA gene 63

sequences directly from environmental samples has made it possible to establish a near-complete 64

reference database for high-taxonomic resolution studies of the wastewater treatment ecosystem, 65

based on approximately one million 16S rRNA gene sequences retrieved directly from the wastewater 66

treatment plants, which provided more than 9,500 exact sequence variants (ESVs) after dereplication 67

and error-correction. Moreover, we created the AutoTax pipeline for generating a comprehensive 68

ecosystem-specific taxonomic database with de novo names for novel taxa. The names are first 69

inherited from sequences present in large-scale SILVA taxonomy, and de novo names are provided 70

for all remaining unclassified taxa based on reproducible clustering with rank-specific identity 71

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

3

thresholds (Yarza et al., 2014). These de novo names provide placeholder names for all novel 72

phylotypes and act as fixed unique identifiers, making the taxonomic assignment independent of the 73

analyzed dataset and enabling cross-study comparisons. 74

Here we present the MiDAS 3 reference database, which is based on the ESVs previously 75

obtained from activated sludge and anaerobic digester systems (Dueholm et al., 2019), amended with 76

additional sequences of potential importance in the field. These sequences represent bacteria from 77

influent wastewater, potential pathogens, and genome-derived sequences of microbes found in 78

wastewater treatment systems. The MiDAS 3 taxonomy is built using AutoTax (Dueholm et al., 2019) 79

and proposes unique provisional names for all microorganisms important in wastewater treatment 80

ecosystems at species-level resolution. We have furthermore curated some of these species-level 81

names, and here we demonstrate the resolution power of MiDAS 3 by analyzing species-level 82

community composition in 20 Danish full-scale WWTPs carrying out nutrient removal with chemical 83

phosphorus removal (BNR) or enhanced biological phosphorus removal (EBPR) over 12 years. We 84

provide important information about diversity, core communities, and the most abundant species in 85

these ecosystems. Although the plants investigated are Danish, the results and approach are relevant 86

for similar systems worldwide. 87

We also present the updated MiDAS Field Guide, an online platform that links the MiDAS 3 88

taxonomy-derived identity of all microbes from wastewater treatment systems to abundance 89

information based on our long-term survey, and functional information (where available). MiDAS 3 90

alleviates important problems related to the taxonomic classification of microbes in environmental 91

ecosystems, such as missing reference sequences and low taxonomic resolution, while our MiDAS 92

Field Guide acts as an online knowledge repository facilitating understanding the function of 93

microbes present in wastewater treatment ecosystems. 94

Materials and methods 95

MiDAS 3 reference database and taxonomy 96

The MiDAS 3 reference database was built using full-length 16S rRNA gene ESVs from 21 activated 97

sludge WWTPs and 16 ADs systems, and taxonomic classification was assigned as described by 98

Dueholm et al., (2019) and shown in Figure 1. The database was amended with 95 additional 99

sequences from published genomes for the bacteria that are potentially important in wastewater 100

treatment systems, but not present in the reference ESVs obtained (Table S2). Taxonomy was 101

assigned to all sequences using the workflow presented in Figure 1. Gaps in the SILVA-derived 102

taxonomy were filled with a de novo taxonomy that provides ‘midas_x_y’ placeholder names, where 103

x represents the first letter of taxonomic level, for which de novo assignment was created, and y is a 104

number derived from the ESV that forms the cluster centroid for a given taxon. Where applicable, 105

taxonomic classifications were curated with taxon names for ESVs that represent genomes published 106

in peer-reviewed literature. Additionally, several manual curations were made, based on MiDAS 2.1 107

database and published literature (full list of amendments is presented in Table S3). MiDAS 108

taxonomy file is available for download from the web platform (midasfieldguide.org/Downloads). 109

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

4

Activated sludge samples 110

Sampling of the activated sludge biomass from 20 Danish full-scale municipal WWTPs was carried 111

out within the MiDAS project (McIlroy et al., 2015). The plants were sampled up to four times a year 112

(February, May, August, October) from 2006 to 2018 with a total of 712 samples. All samples were 113

taken from the aeration tank and sent overnight to Aalborg University for processing. Basic 114

information about the design and operation of the WWTPs as well as number of samples collected 115

are given in Table S1. 116

Amplicon sequencing and bioinformatic analysis 117

DNA extraction, sample preparation including amplification of V1-3 region of 16S rRNA gene, and 118

amplicon sequencing were conducted as described by Stokholm-Bjerregaard et al., (2017). Forward 119

reads were processed as described by Dueholm et al., (2019) with raw reads trimmed to 250 bp, 120

quality filtered, exact amplicon sequence variants (ASVs) identified, and taxonomy assigned using 121

the MiDAS 3 reference database using the SINTAX classifier (Edgar, 2016). 122

Data analysis and visualization 123

Data was imported into R (R Core Team, 2018) using RStudio (RStudio Team, 2015), analyzed and 124

visualized using ggplot2 v.3.1.0 (Wickham, 2009) and ampvis2 v.2.4.9 (K. S. Andersen et al., 2018). 125

The dataset containing 75,017 ASVs was rarefied to 10,000 reads per sample, resulting in 57,902 126

ASVs. Simpson’s alpha-diversity index calculation and redundancy analyses were performed using 127

the ampvis2 package. Redundancy Analysis (RDA) was performed using the Hellinger 128

transformation and constrained to WWTP location. For the analysis of rank abundance and core 129

microbiome, all ASVs without taxonomic classification at the rank analyzed were removed. The 130

remaining datasets consisting of 29,493 ASVs (50.9%) and 18,452 ASVs (31.9%) with genus- and 131

species-level classification were normalized to 100%. These ASVs were represented in total by 1647 132

genera and 3651 species. The unclassified ASVs were primarily due to missing ESV reference 133

sequences for rare ASVs, and the inability of the amplicons to resolve the species-level taxonomy for 134

certain taxa. An example of the latter is genus Trichococcus, where the species T. pasteurii, T. 135

flocculiformis, and T. patagoniensis cannot be distinguished even with full-length 16S rRNA gene 136

sequences. 137

Results and discussion 138

The MiDAS reference database provides species-level classification 139

In the creation of the MiDAS 3 database, we have applied a novel approach (Figure 1), where high-140

throughput sequencing of high-quality full-length 16S rRNA genes (Karst et al., 2018) was combined 141

with a new automatic taxonomy concept, AutoTax (Dueholm et al., 2019). The database contains 142

full-length sequences from 21 Danish WWTPs carrying out nutrient removal, and 16 mesophilic and 143

thermophilic Danish ADs. The MiDAS 3 reference database was additionally supplemented with 144

high-quality full-length 16S rRNA gene sequences of microbes with known importance in the field, 145

such as bacteria present in the influent wastewater and pathogens (Table S2). 146

MiDAS 3 taxonomic assignment is based on the AutoTax framework, where sequences first 147

inherit taxonomic classification from the SILVA database (release 132, Quast et al., 2013), including 148

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

5

species names (based on type strains), if available. The remaining unclassified sequences are assigned 149

unique MiDAS names down to species level. Thus, the new release of MiDAS 3 taxonomy proposes 150

a provisional genus and species name for all microorganisms found in activated sludge and anaerobic 151

digester ecosystems, acting as stable identifiers and placeholder names for the sequences, until the 152

representative microorganisms are further characterized and assigned approved names. Additionally, 153

names derived from recently published genomes and other sources not yet incorporated in SILVA 154

taxonomy, were manually curated (see Table S3 for full list of curated names) to provide a near-155

complete database and taxonomy for microorganisms in wastewater treatment ecosystems. For a few 156

species there were inconsistencies between the genus assigned by SILVA and the type strain name 157

(see Table S4); these issues relate to the fact the SILVA taxonomy is updated based on phylogenetic 158

analyses of the available 16S rRNA gene sequences, whereas type strain names have to be approved 159

by the international taxonomy committee. The possibility to include additional sequences ensures that 160

the database can easily be updated based on the current state of the ecosystem-specific field, and will 161

drive the future releases of the MiDAS database. 162

Two examples of improved taxonomy are shown in Table 1. Acidovorax is one of the most 163

abundant genera in Danish influent wastewater (McIlroy et al., 2017), with two abundant species 164

identified using the MiDAS 3 reference database. A. defluvii is described by the correct classification 165

down to genus level in all but one of the reference databases tested (MiDAS 2, SILVA, Greengenes, 166

RDP). However, the other abundant species (midas_s_8989) was classified only at family level, with 167

genus name either missing or assigned incorrectly by the other databases. 168

Genus Ca. Amarolinea contains abundant filamentous bacteria in Danish WWTPs (Nierychlo 169

et al., 2019) and can be associated with bulking incidents (M. H. Andersen et al., 2018; Nierychlo 170

and Nielsen, 2017). All three abundant species belonging to the genus lack taxonomic classification 171

below order level in public large-scale databases, preventing identification and analysis of this 172

important member of the activated sludge system. 173

Microbiome composition of Danish activated sludge plants 174

In this study, we performed a 12-year survey of bacteria present in 20 Danish full-scale WWTPs with 175

nutrient removal that primarily treated municipal wastewater. All plants had biological nitrogen 176

removal and most also had EBPR (17 plants), while 3 plants had chemical P-removal (BNR plants). 177

The survey was carried out using 16S rRNA gene sequencing of the V1-3 region on the Illumina 178

MiSeq platform. Using the MiDAS 3 taxonomy, microbial community composition was 179

characterized at genus and species level in all plants based on 712 samples collected 2 to 4 times per 180

year for each plant from 2006 to 2018. 181

The overall microbial diversity within each plant was estimated using the Simpson index of 182

diversity and compared for all plants analyzed (Figure 2). The Simpson index places a greater weight 183

on species evenness than the richness (Kim et al., 2017), thus allowing the assessment of how evenly 184

distributed relative taxa contributions are across the samples. Since the value of the index represents 185

the probability that two randomly selected individuals will belong to different taxa, the low values of 186

Simpson index of diversity (Figure 2) indicate that the microbial communities were dominated by a 187

low number of abundant taxa across the plants. Additionally, the low spread of the index in each plant 188

indicates that the microbial communities across the plants were consistently stable across the years. 189

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

6

Similar alpha diversity indices were observed in WWTPs across the world by Wu et al., (2019), 190

calculated as Inverse Simpson (Figure S1). 191

The overall variation in activated sludge community structure in 20 Danish WWTPs was 192

visualized using RDA plot (Figure 3) with the WWTP location as the constrained variable. In many 193

cases the samples coming from different WWTPs were not completely separated from each other, 194

indicating high similarity between the microbial communities in those WWTPs. This is also 195

highlighted by the fact that the eigenvalues of the two principal components plotted relative to the 196

total variance in the data are relatively small (9.7% and 5.6%). Fredericia, Hirtshals, Esbjerg E, and 197

Esbjerg W WWTPs appeared to have a more distinct microbial community composition, presumably 198

due to a high fraction of industrial waste in the influent (Table S1). A pronounced clustering of the 199

samples coming from individual WWTPs indicates that variation between the plants was likely more 200

pronounced than within individual plants over time, and suggests long-term stability of the microbial 201

communities in Danish plants. 202

The microbial diversity of the WWTPs was composed of 1,647 genera and 3,651 species. 203

However, a relatively small fraction of these comprised a large proportion of the reads, thus 204

representing the abundant taxa that are assumed to carry out the main functions in the system (Figure 205

4a) (Saunders et al., 2016). The abundant taxa were defined as those that constitute the top 80% of 206

reads obtained by amplicon sequencing. In the BNR and EBPR plants, we found 157 and 164 207

abundant genera and 421 and 504 abundant species, respectively. These species had a relative ASV 208

read abundance above approx. 0.04%. 209

The overlap of taxa between the BNR and EBPR abundant communities both at genus and 210

species level was high (Figure 4b), and the core taxa, defined as being present and abundant in ≥80% 211

of the samples (Saunders et al., 2016), constituted more than 45% of the total reads at genus level and 212

more than 25% of total reads at species level in both types of plants (Figure 4c). The number of 213

genera and species belonging to the core communities in BNR and EBPR plants constituted a small 214

fraction of the total number of taxa present (1-6-3.5%). Both types of plants shared 72% of genera 215

and 65% of species in their respective core communities. The results show that the inclusion of an 216

anaerobic process step in the EBPR plants compared to the conventional BNR plants did not affect 217

the communities very much. 218

Species-level community composition 219

The most abundant species present in the BNR and EBPR plants are shown in Figure 5. Among the 220

10 most abundant species, two belong to genera with a provisional ‘midas’ name, while of the top 50 221

and top 100 species, 13 and 36 species, respectively, belong to genera not possessing genus-level 222

classification in SILVA (top 100 species are presented in Figure S2). These would be excluded from 223

the analysis if MiDAS 3 was not used as the reference database, demonstrating the advantage of using 224

the ecosystem-specific taxonomy. 225

A large fraction of the most abundant species is poorly described in the literature. The top 50 226

species belong to 41 different genera, of which almost 60% (28 genera) have no known function in 227

wastewater treatment systems (McIlroy et al., 2017, 2015). The function of 65% of all genera defined 228

as abundant in the EBPR and BNR plants (Figure 2a) is unknown, demonstrating the need for more 229

studies that link identity and function. A few species, however, match SILVA type strains, such as 230

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

7

Acidovorax defluvii and Intestinibacter bartletti, so these may be applied for deeper studies of their 231

physiology. 232

The most abundant species in both BNR and EBPR plants belong to the genus Tetrasphaera, 233

which represents polyphosphate-accumulating organisms (PAO) with versatile metabolism, 234

including the potential for denitrification (Herbst et al., 2019; Kristiansen et al., 2013). Other very 235

abundant species belong to the filamentous genera Ca. Microthrix (McIlroy et al., 2013) and Ca. 236

Villigracilis (Nierychlo et al., 2019); the former known to cause serious bulking problems (Rossetti 237

et al., 2005). Trichococcus may also possess filamentous morphology in activated sludge and may be 238

implicated in bulking (Liu and Seviour, 2001). Rhodoferax is a denitrifier (McIlroy et al., 2014), 239

while the function in activated sludge is unknown for species in the Romboutsia and Rhodobacter 240

genera. The novel genus midas_g_17, for which no functional information is available, belongs to 241

the family Saprospiraceae, and was wrongly classified as Ca. Epiflobacter in the previous version of 242

MiDAS taxonomy. Based on the coverage of the FISH probes available and mapping of the partial 243

16S sequences (Xia et al., 2008) to MiDAS 3 database, Ca. Epiflobacter name was re-assigned to a 244

correct genus, which is present at much lower abundance in the ecosystem. MiDAS 3 provides good 245

opportunities to reclassify sequences from previous studies, for which information about distribution 246

and/or function is available. For example, after mapping a set of partial 16S rRNA gene sequences 247

originally related to genus Aquaspirillum (Thomsen et al., 2007), they could be reclassified as 248

midas_g_33 – an abundant novel genus, indicating its potential role in denitrification. 249

Interestingly, the diversity of abundant species within the different genera was generally low in 250

the plants, typical with 3-4 abundant species in each of the most abundant genera. However, the 251

known PAO Tetrasphaera and the less abundant Ca. Accumulibacter exhibited slightly higher 252

diversity, each having 10 and 5 abundant species, respectively (Figure 6). Tetrasphaera had one 253

dominant species (midas_s_5) present in most plants, while the other species appeared more 254

randomly distributed. No clear dominance of any of Ca. Accumulibacter species could be observed 255

as their abundance was very different across the plants. Two of these, Ca. A. phosphatis and Ca. A. 256

aalborgensis are described at the species-level, based on the genomes available (Albertsen et al., 2016; 257

Martín et al., 2006). Interestingly, the genus Dechloromonas, which is also suggested to be a putative 258

PAO (Stokholm-Bjerregaard et al., 2017), was more abundant than Ca. Accumulibacter, and their 259

diversity was low with only two dominant species present in almost all plants. In general, high species 260

diversity of the PAOs may suggest a high degree of specialization allowing the exploitation of 261

different niches in the activated sludge ecosystem. Ca. Accumulimonas, another putative PAO, which 262

was described in the previous version of MiDAS, is not included in the present taxonomy version, as 263

sequences representing that genus, according to the present SILVA taxonomy, were classified to the 264

genus Halomonas. 265

The MiDAS reference database provides a common language for the field 266

Only a few studies have made surveys of the microbial communities in activated sludge systems, and 267

in all cases, it has only been with genus-level and not species-level classification (Saunders et al., 268

2016; Wu et al., 2019). Since the different studies have applied different extraction procedures, 269

different primers (typically V1-3 or V4), and different databases for taxonomic assignment (MiDAS 270

2, SILVA, RDP or Greengenes), it can be difficult to compare results across studies at the genus level, 271

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

8

and impossible at lower taxonomic levels. Therefore, the use of the ecosystem-specific MiDAS 272

database will provide a common language for all work in the field. Another key advantage of the 273

MiDAS 3 taxonomy is that it includes taxa abundant in similar systems across the world (Dueholm 274

et al., 2019), which are missing in SILVA, RDP, and Greengenes. By using this approach, and similar 275

protocols for extraction and primer selection (Albertsen et al., 2015; Dueholm et al., 2019), it will be 276

possible to compare the results of WWTP microbiome studies in the future. 277

The MiDAS reference database for anaerobic digesters 278

Anaerobic digestion is a key technology increasingly used worldwide to reduce waste, generate 279

energy, and minimize the carbon footprint of the plant. In order to provide a comprehensive reference 280

database for the microbes present in the whole wastewater treatment system, MiDAS 3 is based on 281

sequences from activated sludge as well as 16 full-scale Danish anaerobic digesters treating primary 282

and waste activated sludge. A detailed analysis of bacterial and archaeal community structure at 283

species-level has been performed in another study (Jiang et al., n.d.). The study was based on >1000 284

samples collected over 6 years from 46 anaerobic digesters, located at 22 WWTPs in Denmark. When 285

the microbial data was coupled with the analysis of operational, physicochemical, and performance 286

parameters, the main factors driving the assembly of the AD community were elucidated. 287

MiDAS Field Guide online platform (midasfieldguide.org) 288

The new MiDAS Field Guide (www.midasfieldguide.org) is a comprehensive database of microbes 289

in wastewater treatment systems, which was updated at the same time as the release of the MiDAS 3 290

database took place. All microbes found in activated sludge and anaerobic digesters are now included 291

in the field guide and, for the first time, species are listed for each genus. Our database now covers 292

more than 1,800 genera and 4,200 species found in biological nutrient removal treatment plants and 293

anaerobic digesters. More detailed abundance information is now provided both at genus and species 294

level, and functional information is described where available. 295

The MiDAS Field Guide integrates the identity of the microbes with the available functional 296

information (Figure 7) into a community knowledge platform in the form of a searchable database 297

of referenced information, thus acting as a central, on-line repository for current knowledge, where 298

all are welcome to contribute. Due to the rapidly growing amount of relevant literature, and high 299

number of taxa included, the MiDAS Field Guide is a continuous work in progress, with our efforts 300

prioritized towards the microorganisms that comprise the core microbiome as well as other abundant 301

or process-critical species. 302

New features launched in the updated MiDAS Field Guide include: 1) phylogenetic Search by 303

taxonomy function, where all microbes found in the ecosystem are displayed in their respective 304

taxonomic groups, 2) the possibility of classifying sequences using blast function, and MiDAS 3 305

taxonomy directly online. Updated protocols for DNA extraction and amplicon library preparation 306

are available online to facilitate establishment of common framework for studies of microbial ecology 307

of wastewater treatment systems. 308

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

9

Conclusions 309

Here we present the new ecosystem-specific MiDAS 3 reference database, based on a comprehensive 310

set of full-length 16S rRNA gene sequences derived from activated sludge and anaerobic digester 311

systems, and MiDAS 3 taxonomy built using AutoTax. It proposes unique provisional names for all 312

microorganisms important in wastewater treatment ecosystems, down to species level. MiDAS 3 was 313

used in the detailed analysis of microbial communities in 20 Danish wastewater treatment plants 314

(WWTPs) with nutrient removal, sampled over 12 years. This enabled unprecedented resolution, 315

revealing for the first time all species present in the plants, the stability of the communities, many 316

abundant core taxa, and the diversity of species in important functional guilds, exemplified by the 317

PAOs. We found many genera without known function, emphasizing the need for more efforts 318

towards elucidating the role of important members of wastewater treatment ecosystems. We also 319

present a new version of the MiDAS Field Guide, an online resource with a complete list of genera 320

and species found in activated sludge and anaerobic digesters. Through the field guide, microbe 321

identity is linked to available knowledge on function, and user microbial 16S rRNA gene sequences 322

can be classified online with the MiDAS 3 taxonomy. The central purpose of the MiDAS Field Guide 323

is to facilitate collaborative research into wastewater treatment ecosystem functions, which will 324

support efforts towards the sustainable production of clean water and bioenergy, and the development 325

of resource recovery, ending ultimately in the circular economy. 326

Acknowledgement 327

The project has been funded by the Danish Research Council (grant 6111-00617A), Innovation Fund 328

Denmark (OnlineDNA, grant 6155-00003A), the Villum foundation (Dark Matter and grant 13351), 329

and Danish wastewater treatment plants. 330

References 331

Albertsen, M., Karst, S.M., Ziegler, A.S., Kirkegaard, R.H., Nielsen, P.H., 2015. Back to Basics – 332

The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated 333

Sludge Communities. PLoS ONE 10, e0132783. 334

https://doi.org/10.1371/journal.pone.0132783 335

Albertsen, M., McIlroy, S.J., Stokholm-Bjerregaard, M., Karst, S.M., Nielsen, P.H., 2016. 336

“Candidatus Propionivibrio aalborgensis”: A Novel Glycogen Accumulating Organism 337

Abundant in Full-Scale Enhanced Biological Phosphorus Removal Plants. Front. Microbiol. 338

7. https://doi.org/10.3389/fmicb.2016.01033 339

Andersen, K.S., Kirkegaard, R.H., Karst, S.M., Albertsen, M., 2018. ampvis2: an R package to 340

analyse and visualise 16S rRNA amplicon data. bioRxiv 299537. 341

https://doi.org/10.1101/299537 342

Andersen, M.H., McIlroy, S.J., Nierychlo, M., Nielsen, P.H., Albertsen, M., 2018. Genomic 343

insights into Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with 344

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

10

settleability problems in wastewater treatment plants. Systematic and Applied Microbiology. 345

https://doi.org/10.1016/j.syapm.2018.08.001 346

Boughner, L.A., Singh, P., 2016. Microbial Ecology: Where are we now? Postdoc J 4, 3–17. 347

https://doi.org/10.14304/SURYA.JPR.V4N11.2 348

Callahan, B.J., Wong, J., Heiner, C., Oh, S., Theriot, C.M., Gulati, A.S., McGill, S.K., Dougherty, 349

M.K., 2019. High-throughput amplicon sequencing of the full-length 16S rRNA gene with 350

single-nucleotide resolution. Nucleic Acids Res 47, e103–e103. 351

https://doi.org/10.1093/nar/gkz569 352

Cole, J.R., Wang, Q., Fish, J.A., Chai, B., McGarrell, D.M., Sun, Y., Brown, C.T., Porras-Alfaro, 353

A., Kuske, C.R., Tiedje, J.M., 2014. Ribosomal Database Project: data and tools for high 354

throughput rRNA analysis. Nucleic Acids Res 42, D633–D642. 355

https://doi.org/10.1093/nar/gkt1244 356

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, 357

D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA gene database 358

and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072. 359

https://doi.org/10.1128/AEM.03006-05 360

Dueholm, M.S., Andersen, K.S., Petriglieri, F., McIlroy, S.J., Nierychlo, M., Petersen, J.F., 361

Kristensen, J.M., Yashiro, E., Karst, S.M., Albertsen, M., Nielsen, P.H., 2019. 362

Comprehensive ecosystem-specific 16S rRNA gene databases with automated taxonomy 363

assignment (AutoTax) provide species-level resolution in microbial ecology. bioRxiv. 364

Edgar, R.C., 2016. SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS 365

sequences. bioRxiv 074161. https://doi.org/10.1101/074161 366

Glöckner, F.O., Yilmaz, P., Quast, C., Gerken, J., Beccati, A., Ciuprina, A., Bruns, G., Yarza, P., 367

Peplies, J., Westram, R., Ludwig, W., 2017. 25 years of serving the community with 368

ribosomal RNA gene reference databases and tools. Journal of Biotechnology, 369

Bioinformatics Solutions for Big Data Analysis in Life Sciences presented by the German 370

Network for Bioinformatics Infrastructure 261, 169–176. 371

https://doi.org/10.1016/j.jbiotec.2017.06.1198 372

Herbst, F.-A., Dueholm, M.S., Wimmer, R., Nielsen, P.H., 2019. The Proteome of Tetrasphaera 373

elongata is adapted to Changing Conditions in Wastewater Treatment Plants. Proteomes 7. 374

https://doi.org/10.3390/proteomes7020016 375

Jiang, C., Andersen, M.H., Peces, M., Nierychlo, M., Yashiro, E., Andersen, K.S., Kirkegaard, 376

R.H., Kucheryavskiy, S., Hao, L., Høgh, J., Hansen, A.A., Dueholm, M.S., Nielsen, P.H., 377

n.d. Associations between species-level microbial communities and key variables revealed 378

in anaerobic digesters at 22 Danish wastewater treatment plants during a six-years survey. 379

BioRxiv. 380

Karst, S.M., Dueholm, M.S., McIlroy, S.J., Kirkegaard, R.H., Nielsen, P.H., Albertsen, M., 2018. 381

Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences 382

without primer bias. Nature Biotechnology. https://doi.org/10.1038/nbt.4045 383

Karst, S.M., Ziels, R.M., Kirkegaard, R.H., Albertsen, M., 2019. Enabling high-accuracy long-read 384

amplicon sequences using unique molecular identifiers and Nanopore sequencing. bioRxiv 385

645903. https://doi.org/10.1101/645903 386

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

11

Kim, B.-R., Shin, J., Guevarra, R.B., Lee, J.H., Kim, D.W., Seol, K.-H., Lee, J.-H., Kim, H.B., 387

Isaacson, R.E., 2017. Deciphering Diversity Indices for a Better Understanding of Microbial 388

Communities. Journal of Microbiology and Biotechnology 27, 2089–2093. 389

https://doi.org/10.4014/jmb.1709.09027 390

Kristiansen, R., Nguyen, H.T.T., Saunders, A.M., Nielsen, J.L., Wimmer, R., Le, V.Q., McIlroy, 391

S.J., Petrovski, S., Seviour, R.J., Calteau, A., Nielsen, K.L., Nielsen, P.H., 2013. A 392

metabolic model for members of the genus Tetrasphaera involved in enhanced biological 393

phosphorus removal. ISME J 7, 543–554. https://doi.org/10.1038/ismej.2012.136 394

Liu, J.R., Seviour, R.J., 2001. Design and application of oligonucleotide probes for fluorescent in 395

situ identification of the filamentous bacterial morphotype Nostocoida limicola in activated 396

sludge. Environmental Microbiology 3, 551–560. https://doi.org/10.1046/j.1462-397

2920.2001.00229.x 398

Martín, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C., Yeates, C., He, 399

S., Salamov, A.A., Szeto, E., Dalin, E., Putnam, N.H., Shapiro, H.J., Pangilinan, J.L., 400

Rigoutsos, I., Kyrpides, N.C., Blackall, L.L., McMahon, K.D., Hugenholtz, P., 2006. 401

Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge 402

communities. Nat Biotechnol 24, 1263–1269. https://doi.org/10.1038/nbt1247 403

McIlroy, S.J., Kirkegaard, R.H., McIlroy, B., Nierychlo, M., Kristensen, J.M., Karst, S.M., 404

Albertsen, M., Nielsen, P.H., 2017. MiDAS 2.0: an ecosystem-specific taxonomy and online 405

database for the organisms of wastewater treatment systems expanded for anaerobic digester 406

groups. Database 2017. https://doi.org/10.1093/database/bax016 407

McIlroy, S.J., Kristiansen, R., Albertsen, M., Michael Karst, S., Rossetti, S., Lund Nielsen, J., 408

Tandoi, V., James Seviour, R., Nielsen, P.H., 2013. Metabolic model for the filamentous 409

‘Candidatus Microthrix parvicella’’ based on genomic and metagenomic analyses.’ ISME J 410

7, 1161–1172. https://doi.org/10.1038/ismej.2013.6 411

McIlroy, S.J., Saunders, A.M., Albertsen, M., Nierychlo, M., McIlroy, B., Hansen, A.A., Karst, 412

S.M., Nielsen, J.L., Nielsen, P.H., 2015. MiDAS: the field guide to the microbes of 413

activated sludge. Database 2015, bav062. https://doi.org/10.1093/database/bav062 414

McIlroy, S.J., Starnawska, A., Starnawski, P., Saunders, A.M., Nierychlo, M., Nielsen, P.H., 415

Nielsen, J.L., 2014. Identification of active denitrifiers in full-scale nutrient removal 416

wastewater treatment systems. Environ Microbiol n/a-n/a. https://doi.org/10.1111/1462-417

2920.12614 418

Nierychlo, M., Miłobędzka, A., Petriglieri, F., McIlroy, B., Nielsen, P.H., McIlroy, S.J., 2019. The 419

morphology and metabolic potential of the Chloroflexi in full-scale activated sludge 420

wastewater treatment plants. FEMS Microbiol Ecol 95. 421

https://doi.org/10.1093/femsec/fiy228 422

Nierychlo, M., Nielsen, P.H., 2017. Experiences in various countries: Denmark, in: Tandoi, V., 423

Rossetti, S., Wanner, J. (Eds.), Activated Sludge Separation Problems: Theory, Control 424

Measures, Practical Experiences. IWA Publishing, pp. 198–209. 425

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F.O., 426

2013. The SILVA ribosomal RNA gene database project: improved data processing and 427

web-based tools. Nucl. Acids Res. 41, D590–D596. https://doi.org/10.1093/nar/gks1219 428

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

12

R Core Team, 2018. R: A language and environment for statistical computing. R Foundation for 429

Statistical Computing, Vienna, Austria. Available online at https://www.R-project.org/. 430

Rossetti, S., Tomei, M.C., Nielsen, P.H., Tandoi, V., 2005. “Microthrix parvicella”, a filamentous 431

bacterium causing bulking and foaming in activated sludge systems: a review of current 432

knowledge. FEMS Microbiol. Rev. 29, 49–64. https://doi.org/10.1016/j.femsre.2004.09.005 433

RStudio Team, 2015. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL 434

http://www.rstudio.com/. 435

Saunders, A.M., Albertsen, M., Vollertsen, J., Nielsen, P.H., 2016. The activated sludge ecosystem 436

contains a core community of abundant organisms. The ISME journal 10, 11–20. 437

Stokholm-Bjerregaard, M., McIlroy, S.J., Nierychlo, M., Karst, S.M., Albertsen, M., Nielsen, P.H., 438

2017. A Critical Assessment of the Microorganisms Proposed to be Important to Enhanced 439

Biological Phosphorus Removal in Full-Scale Wastewater Treatment Systems. Front. 440

Microbiol. 8. https://doi.org/10.3389/fmicb.2017.00718 441

Thomsen, T.R., Kong, Y., Nielsen, P.H., 2007. Ecophysiology of abundant denitrifying bacteria in 442

activated sludge. FEMS Microbiology Ecology 60, 370–382. https://doi.org/10.1111/j.1574-443

6941.2007.00309.x 444

Wickham, H., 2009. ggplot2 - Elegant Graphics for Data Analysis. Springer Science + Business 445

Media. 446

Wu, L., Ning, D., Zhang, B., Li, Y., Zhang, P., Shan, X., Zhang, Q., Brown, M., Li, Z., Nostrand, 447

J.D.V., Ling, F., Xiao, N., Zhang, Y., Vierheilig, J., Wells, G.F., Yang, Y., Deng, Y., Tu, Q., 448

Wang, A., Zhang, T., He, Z., Keller, J., Nielsen, P.H., Alvarez, P.J.J., Criddle, C.S., 449

Wagner, M., Tiedje, J.M., He, Q., Curtis, T.P., Stahl, D.A., Alvarez-Cohen, L., Rittmann, 450

B.E., Wen, X., Zhou, J., 2019. Global diversity and biogeography of bacterial communities 451

in wastewater treatment plants. Nature Microbiology 1. https://doi.org/10.1038/s41564-019-452

0426-5 453

Xia, Y., Kong, Y., Nielsen, P.H., 2008. In situ detection of starch-hydrolyzing microorganisms in 454

activated sludge. FEMS Microbiology Ecology 66, 462–471. https://doi.org/10.1111/j.1574-455

6941.2008.00559.x 456

Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., 457

Euzéby, J., Amann, R., Rosselló-Móra, R., 2014. Uniting the classification of cultured and 458

uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Micro 12, 635–459

645. https://doi.org/10.1038/nrmicro3330 460

461

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

13

Tables: 462

Table 1. Classification of abundant species belonging to genus Acidovorax and Ca. Amarolinea with 463

different taxonomies. 464

Taxonomy Phylum Class Order Family Genus Species

MiDAS 3 Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax A. defluvii

MiDAS 2.1 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax -

SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax -

Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -

RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax -

MiDAS 3 Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax midas_s_8989

MiDAS 2.1 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -

SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae - -

Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - -

RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Simplicispira -

MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea midas_s_1

MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -

SILVA Chloroflexi Anaerolineae C10_SB1A - - -

Greengenes Chloroflexi Anaerolineae SHA-20 - - -

RDP - - - - - -

MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea midas_s_553

MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -

SILVA Chloroflexi Anaerolineae C10_SB1A - - -

Greengenes Chloroflexi Anaerolineae SHA-20 - - -

RDP - - - - - -

MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca. Amarolinea Ca. A. aalborgensis

MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum -

SILVA Chloroflexi Anaerolineae C10_SB1A - - -

Greengenes Chloroflexi Anaerolineae SHA-20 - - -

RDP - - - - - -

MiDAS 2.1 (McIlroy et al., 2017); SILVA: Release 132_SSURef_NR99 (Quast et al., 2013); GreenGenes: Release 16s_13.5 (DeSantis 465 et al., 2006); Ribosomal Database Project: Release 16S_v16 training set (Cole et al., 2014). 466

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

14

Figures: 467

468 Figure 1. MiDAS 3 taxonomic assignment workflow based on the AutoTax framework. (1) ESVs 469

from the study of Danish plants (Dueholm et al., 2019), amended with AddOns ESVs coming from, 470

e.g., published genomes, are first mapped to the SILVA_132_SSURef_Nr99 database to identify the 471

closest relative. (2) Taxonomy is adopted from this sequence after trimming, based on identity and 472

the taxonomic thresholds proposed by Yarza et al., (2014) with 94.5% sequence similarity for genus-473

level. To gain species information, ESVs are mapped to sequences from type strains extracted from 474

the SILVA database, and species names were adopted if the identity was >98.7% and the type strain 475

genus matched that of the closest reference in the complete database. (3) ESVs were also clustered 476

by greedy clustering at different identities, corresponding to the thresholds proposed by Yarza et al., 477

(2014) to generate a stable de novo taxonomy. (4) The de novo taxonomy was curated with taxon 478

names for ESVs that represent, e.g., genomes from strains published in peer-reviewed literature. (5) 479

Finally, a comprehensive taxonomy was obtained by filling gaps in the SILVA-based taxonomy with 480

the de novo taxonomy with provisional “midas” placeholder names (adapted from Dueholm et al., 481

2019). 482

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

15

483 Figure 2. Alpha diversity estimates of activated sludge community composition using ASV-level 484

taxa in individual plants, calculated using the Simpson index of diversity (1-D). Data represents 712 485

samples from 20 Danish full-scale WWTPs collected from 2006 to 2018 (with 17-51 samples per 486

plant). 487

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

16

488 Figure 3. Redundancy analysis (RDA) plots of activated sludge communities, constrained to 489

visualize the variance among 712 samples explained by WWTP location. Axes show the percent of 490

total variance described by each component (first and second value represent relative contribution of 491

each axis to the total variance in the data and the constrained space only, respectively). Data represents 492

samples from 20 Danish full-scale WWTPs collected from 2006 to 2018. 493

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

17

494 Figure 4 a-c. a) Cumulative read abundance of genus- and species-level taxa found in activated 495

sludge, plotted in rank order. Dashed line denotes 80% of the reads, which is assumed to cover the 496

abundant taxa, b) Venn-diagram showing abundant (top 80% of the reads) genera and species shared 497

between BNR and EBPR plants, c) core community (observation frequency vs abundance frequency) 498

in Danish BNR and EBPR plants for genus- and species-level taxa. Abundance cut-off values 499

represent the lowest abundance of the taxon belonging to the top 80% of the reads in given group 500

(genus/species). Darker points indicate multiple observations at given positions. Colored points 501

designate core taxa. Data represents 111 samples from BNR plants and 601 samples from EBPR 502

plants. 503

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

18

504 Figure 5. Boxplot showing the occurrence of the top 50 most abundant species in Danish BNR and 505

EBPR WWTPs. 506

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

19

507 Figure 6. Occurrence of abundant PAO species in 20 full-scale activated sludge WWTPs. Data

represents average read abundance values for each plant, based on samples collected 2–4 times per

year from 2006 to 2018. Individual PAO genera are indicated by different colors.

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint

20

Figure 7. MiDAS Field Guide example of Tetrasphaera genus entry.

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842393doi: bioRxiv preprint