15
Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning Zhili He, a,b,c,d Ping Zhang, c,d Linwei Wu, c,d,e,f Andrea M. Rocha, g,h Qichao Tu, c,d Zhou Shi, c,d Bo Wu, a,b,c,d Yujia Qin, c,d Jianjun Wang, c,d Qingyun Yan, a,b,c,d Daniel Curtis, c,d Daliang Ning, c,d,e Joy D. Van Nostrand, c,d Liyou Wu, c,d Yunfeng Yang, f Dwayne A. Elias, g David B. Watson, g Michael W. W. Adams, i Matthew W. Fields, j Eric J. Alm, k Terry C. Hazen, g,h Paul D. Adams, l,m Adam P. Arkin, l,m Jizhong Zhou c,d,e,f,l a Environmental Microbiomics Research Center, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou, China b Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou, China c Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA d Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, USA e School of Civil Engineering and Environmental Sciences, University of Oklahoma, Norman, Oklahoma, USA f School of Environment, Tsinghua University, Beijing, China g Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA h Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, Tennessee, USA i Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, USA j Department of Microbiology and Immunology, Montana State University, Bozeman, Montana, USA k Biological Engineering Department, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA l Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, California, USA m Department of Bioengineering, University of California, Berkeley, California, USA ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination af- fects the biodiversity of groundwater microbiomes and ecosystem functioning re- mains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other con- taminants. We hypothesized that the functional diversity of groundwater micro- biomes would decrease as environmental contamination (e.g., uranium or nitrate) in- creased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial func- tional genes and/or populations could be used to predict groundwater contamina- tion and ecosystem functioning. Our results indicated that functional richness/diver- sity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive func- tional gene array (GeoChip 5) increased significantly (P 0.05) as uranium or nitrate increased, and their changes could be used to successfully predict uranium and ni- trate contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting eco- system functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmen- tal contamination and ecosystem functioning. The results indicated that the overall Received 10 January 2018 Accepted 17 January 2018 Published 20 February 2018 Citation He Z, Zhang P, Wu L, Rocha AM, Tu Q, Shi Z, Wu B, Qin Y, Wang J, Yan Q, Curtis D, Ning D, Van Nostrand JD, Wu L, Yang Y, Elias DA, Watson DB, Adams MWW, Fields MW, Alm EJ, Hazen TC, Adams PD, Arkin AP, Zhou J. 2018. Microbial functional gene diversity predicts groundwater contamination and ecosystem functioning. mBio 9:e02435-17. https://doi.org/ 10.1128/mBio.02435-17. Editor Jennifer Martiny, University of California, Irvine Copyright © 2018 He et al. This is an open- access article distributed under the terms of the Creative Commons Attribution 4.0 International license. Address correspondence to Zhili He, [email protected], or Jizhong Zhou, [email protected]. This article is a direct contribution from a Fellow of the American Academy of Microbiology. Solicited external reviewers: Tamar Barkay, Rutgers, The State University of New Jersey; Wen-Tso Liu, University of Illinois at Urbana Champaign. RESEARCH ARTICLE crossm January/February 2018 Volume 9 Issue 1 e02435-17 ® mbio.asm.org 1 on August 26, 2020 by guest http://mbio.asm.org/ Downloaded from

Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

Microbial Functional Gene Diversity Predicts GroundwaterContamination and Ecosystem Functioning

Zhili Heabcd Ping Zhangcd Linwei Wucdef Andrea M Rochagh Qichao Tucd Zhou Shicd Bo Wuabcd Yujia Qincd

Jianjun Wangcd Qingyun Yanabcd Daniel Curtiscd Daliang Ningcde Joy D Van Nostrandcd Liyou Wucd Yunfeng Yangf

Dwayne A Eliasg David B Watsong Michael W W Adamsi Matthew W Fieldsj Eric J Almk Terry C Hazengh

Paul D Adamslm Adam P Arkinlm Jizhong Zhoucdefl

aEnvironmental Microbiomics Research Center School of Environmental Science and Engineering SunYat-Sen University Guangzhou China

bGuangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation TechnologySchool of Environmental Science and Engineering Sun Yat-Sen University Guangzhou China

cInstitute for Environmental Genomics University of Oklahoma Norman Oklahoma USAdDepartment of Microbiology and Plant Biology University of Oklahoma Norman Oklahoma USAeSchool of Civil Engineering and Environmental Sciences University of Oklahoma Norman Oklahoma USAfSchool of Environment Tsinghua University Beijing ChinagBiosciences Division Oak Ridge National Laboratory Oak Ridge Tennessee USAhDepartment of Civil and Environmental Engineering University of Tennessee Knoxville Tennessee USAiDepartment of Biochemistry and Molecular Biology University of Georgia Athens Georgia USAjDepartment of Microbiology and Immunology Montana State University Bozeman Montana USAkBiological Engineering Department Massachusetts Institute of Technology Cambridge Massachusetts USAlEarth and Environmental Sciences Lawrence Berkeley National Laboratory Berkeley California USAmDepartment of Bioengineering University of California Berkeley California USA

ABSTRACT Contamination from anthropogenic activities has significantly impactedEarthrsquos biosphere However knowledge about how environmental contamination af-fects the biodiversity of groundwater microbiomes and ecosystem functioning re-mains very limited Here we used a comprehensive functional gene array to analyzegroundwater microbiomes from 69 wells at the Oak Ridge Field Research Center(Oak Ridge TN) representing a wide pH range and uranium nitrate and other con-taminants We hypothesized that the functional diversity of groundwater micro-biomes would decrease as environmental contamination (eg uranium or nitrate) in-creased or at low or high pH while some specific populations capable of utilizing orresistant to those contaminants would increase and thus such key microbial func-tional genes andor populations could be used to predict groundwater contamina-tion and ecosystem functioning Our results indicated that functional richnessdiver-sity decreased as uranium (but not nitrate) increased in groundwater In additionabout 59 of specific key functional populations targeted by a comprehensive func-tional gene array (GeoChip 5) increased significantly (P 005) as uranium or nitrateincreased and their changes could be used to successfully predict uranium and ni-trate contamination and ecosystem functioning This study indicates great potentialfor using microbial functional genes to predict environmental contamination andecosystem functioning

IMPORTANCE Disentangling the relationships between biodiversity and ecosystemfunctioning is an important but poorly understood topic in ecology Predicting eco-system functioning on the basis of biodiversity is even more difficult particularlywith microbial biomarkers As an exploratory effort this study used key microbialfunctional genes as biomarkers to provide predictive understanding of environmen-tal contamination and ecosystem functioning The results indicated that the overall

Received 10 January 2018 Accepted 17January 2018 Published 20 February 2018

Citation He Z Zhang P Wu L Rocha AM Tu QShi Z Wu B Qin Y Wang J Yan Q Curtis DNing D Van Nostrand JD Wu L Yang Y EliasDA Watson DB Adams MWW Fields MW AlmEJ Hazen TC Adams PD Arkin AP Zhou J 2018Microbial functional gene diversity predictsgroundwater contamination and ecosystemfunctioning mBio 9e02435-17 httpsdoiorg101128mBio02435-17

Editor Jennifer Martiny University of CaliforniaIrvine

Copyright copy 2018 He et al This is an open-access article distributed under the terms ofthe Creative Commons Attribution 40International license

Address correspondence to Zhili Hehezhilimailsysueducn or Jizhong Zhoujzhououedu

This article is a direct contribution from aFellow of the American Academy ofMicrobiology Solicited external reviewersTamar Barkay Rutgers The State University ofNew Jersey Wen-Tso Liu University of Illinois atUrbana Champaign

RESEARCH ARTICLE

crossm

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 reg mbioasmorg 1

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

functional gene richnessdiversity decreased as uranium increased in groundwaterwhile specific key microbial guilds increased significantly as uranium or nitrate in-creased These key microbial functional genes could be used to successfully predictenvironmental contamination and ecosystem functioning This study represents a signifi-cant advance in using functional gene markers to predict the spatial distribution ofenvironmental contaminants and ecosystem functioning toward predictive microbialecology which is an ultimate goal of microbial ecology

KEYWORDS groundwater microbiome random forest ecosystem functioningenvironmental contamination metagenomics microbial functional gene

Anthropogenic activities have impacted Earthrsquos biosphere through climate changecontamination of air water and soil environments introduction of invasive spe-

cies depletion of natural resources and alterations of biogeochemical cycling (1 2)These activities have reduced biodiversity destabilized ecosystem functions such ascarbon (C) and nitrogen (N) cycles and threatened human health (3ndash7) A recent studyshowed that several distinct factors such as concentrations of sulfate iron anddissolved CH4 and H2 might control the composition of groundwater microbiomes andthat the microbial functional diversity (FD) could explain groundwater chemistry in apristine aquifer (8) However the ecological consequences and mechanisms of envi-ronmental contamination in the biodiversity of microbial communities and ecosystemfunctioning remain largely unclear Even more challenging is to establish linkagesbetween microbial biodiversity and ecosystem functioning

It is generally believed that FD is better than taxonomic diversity (TD) andorphylogenetic diversity (PD) for predicting ecosystem functioning (9ndash12) For example arecent study across a gradient of sites from the subarctic to the tropics showed that areduction of decomposer FD consistently decreased the rate of litter decompositionand carbon and nitrogen cycling (13) However how to select molecular functionalpredictors (eg functional genes) remains a challenging question (11) Functional genearrays (eg GeoChip) target key genes involved in geochemical cycles bioremediationstress responses and other environmental processes and have been widely used tofunctionally profile microbial communities (14ndash19) Therefore GeoChip is an ideal toolto examine the impacts of environmental contaminants on groundwater microbiomes

The Oak Ridge Integrated Field Research Challenge (OR-IFRC) experimental sitelocated in Bear Creek Valley Oak Ridge TN is a legacy site for the early developmentof enriched uranium (U) under the Manhattan Project At this site numerous studieshave been conducted to examine the impact of contaminants on biological commu-nities and ecosystem functioning (20ndash29) For example a metagenome analysis ofFW106 a highly contaminated well showed that high relative levels of abundance ofkey genes encoding geochemical resistance functions were required for microbialsurvival in the presence of known environmental contaminants at the site (20) Also keyfunctional groups have been isolated and identified from the OR-IFRC site includingsulfate-reducing bacteria (SRB) nitrate-reducing bacteria (NRB) and metal-reducingbacteria (MRB) like Anaeromyxobacter Clostridium Desulfovibrio DesulfitobacteriumGeobacter Hyphomicrobium Intrasporangium Pseudomonas and Rhodanobacter spe-cies (20 23 25ndash29) Recently groundwater from 93 noncontaminated and contami-nated wells along the Bear Creek Valley at the OR-IFRC site were sampled Those wellshad a wide range of environmental gradients and associated ecosystem data (22) thusmaking it possible to use microbial community data for predicting groundwatercontamination The results showed that 16S rRNA gene-sequencing analysis of ground-water microbiomes could accurately identify environmental contaminants (eg ura-nium or nitrate) at the OR-IFRC site (22) However taxonomic information alone maynot be enough to reflect the functional aspects of microbial communities or ecosys-tems as not all members of a taxon may carry certain functional genes making itdifficult to predict geochemical properties especially ecosystem functioning Thefollowing important questions remain to be addressed (i) How does the functional

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 2

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

diversity of groundwater microbiomes change across a range of environmental gradi-ents (eg pH uranium and nitrate) (ii) What specific functional genespopulations arestimulated under high concentrations of uranium and nitrate (iii) Is it possible topredict environmental contamination (eg uranium or nitrate) and ecosystem func-tioning using microbial functional genes

In this study we hypothesized the following (i) FD would decrease with increasedenvironmental contamination (eg uranium or nitrate) or a significant change inenvironmental conditions (eg pH) (ii) under conditions of uranium and nitratecontamination the abundance of some key functional genespopulations (eg dsrAand cytochrome genes for uranium reduction or nirK and napA for nitrate reduction)would increase while the rest would decrease or remain unchanged and (iii) therelationship between FD and environmental contamination or ecosystem functioningwould be predictable based on key microbial functional genes To test those hypoth-eses we used a new version of a functional-gene microarray (GeoChip 50) to analyzegroundwater microbiomes from 69 wells at the OR-IFRC site GeoChip is able toquantitatively detect known microbial functions but generally does not target un-known functions from known or unknown microbial groups Our results indicate thatthe overall FD decreased as uranium (but not nitrate) concentrations increased or atlow or high pH however some specific functional genespopulations were stimulatedin response to uranium and nitrate contamination Such microbial functional genescould be used to successfully predict uranium and nitrate contamination and ecosys-tem functioning This study provides new insights for our understanding of the impactsof environmental contaminants on groundwater microbiomes and demonstrates thepredictive power of microbial functional genes for environmental contamination andecosystem functioning

RESULTSGeochemical properties and ecosystem function indicators A total of 38 envi-

ronmental variables were measured including pH contaminant (eg uranium andnitrate) concentrations dissolved gases (eg CO2 CH4 N2O and H2S) as ecosystemfunction indicators dissolved C and N and direct cell counts which were largely usedin this study (see Table S1 in the supplemental material) The 69 wells had wide rangesof uranium nitrate and pH levels with uranium at 0 to 553 mgliter (average15 mgliter) nitrate at 0 to 11648 mg NO3

-N (nitrate as nitrogen)liter (average 641mg NO3

-Nliter) and pHs of 3 to 105 (average pH 69) Furthermore wells with highconcentrations of uranium (eg 3 mgliter) also had high concentrations of nitrate(1516 to 11648 mg NO3

-Nliter) and low pHs (30 to 52) Also dissolved gases variedgreatly with CO2 at 0 to 29739 mgliter (average 476 mgliter) N2O at 0 to 12 mgliter(average 01 mgliter) CH4 at 0 to 06 mgliter (average close to 0 mgliter) andH2S at 0 to 42 mgliter (average 01 mgliter) The amounts of bacterial biomass ingroundwater samples ranged from 35 102 to 18 106 cellsml (average 12 105

cellsml) while the levels of dissolved organic carbon (DOC) were 02 to 1282 mgliter(average 78 mgliter) and the levels of dissolved inorganic carbon (DIC) were 94 to1792 mgliter (average 583 mgliter) Such large ranges of environmental gradientsprovide an advantage in testing the relationships between functional gene diversityand environmental contamination as well as ecosystem functioning

The relationships between functional richnessdiversityabundance microbialbiomass and contaminant concentrations For this study we defined functionalrichness as the number of functional genes detected by GeoChip 50 and functionaldiversity as the Shannon diversity index The overall levels of functional gene richnessand diversity decreased significantly (P 005) as uranium concentrations increasedThe functional diversity was highest at a neutral pH not under low- or high-pH conditionsbut it was not significantly (P 005) impacted by nitrate concentrations in ground-water (Fig 1 Fig S1) We further examined the relationships between the levels ofabundance of key gene families along the environmental gradients For example theabundances of sulfur (S) cycling genes (eg dsrA and sqr) and cytochrome and

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 3

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

hydrogenase genes decreased significantly (P 005) with increasing uranium con-centrations (Fig 2A to D) However the abundances of denitrification (eg nirK andnosZ) dissimilatory N reduction (eg napA) and assimilatory N reduction (eg nasA)genes did not decrease significantly (r 0125 to 0210 P 005) with increasingnitrate concentrations (Fig 2E to H) Further analysis of other key N cycling genesshowed significantly (P 005) decreased abundances with increased uranium or atlow or high pHs but no significant (P 005) correlations were observed between Ncycling gene abundances and nitrate concentrations (Table S2) In addition the effectsof uranium and pH on microbial biomass (measured by direct cell count) were notsignificant (P 005) nor was there a significant correlation (P 005) betweenbiomass and functional richness but it appeared that microbial biomass increasedsignificantly (P 0001) with increased nitrate concentrations suggesting that nitrateconsumers (eg nitrate reducers) may be dominant in the environment (Fig S2)Further analysis showed that the abundance of ~95 of genes detected by GeoChip50 decreased while only about 5 of them increased indicating that most of thefunctional genes were inhibited or remained unchanged as uranium and nitrateconcentrations increased

Key functional populations stimulated in response to a uranium gradientAlthough the richness and diversity of functional genes generally decreased as uraniumconcentrations increased in groundwater some specific populations of certain func-

FIG 1 Relationships between the overall functional richness and concentrations of uranium (A) andnitrate (B) as well as pH (C) in groundwater Uranium and nitrate concentrations were first logtransformed and then linear regressions were performed for functional richness and uranium or nitrateconcentrations Nonlinear regression was used for functional richness and pH

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 4

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

tional gliders did increase significantly (P 005) (Fig 3A and B Table S3) For examplethe abundance of 43 dsrA-bearing populations (~58 of total dsrA detected byGeoChip 5) mostly uncultured SRB with a few sequenced species (eg Halorhodospirahalophila Desulfobulbus propionicus Pelodictyon luteolum and Vibrio rotiferianus) in-creased significantly (P 005) (Table S3) In particular five abundant dsrA probesgenevariants (gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858) derivedfrom uncultured SRB were identified as being significantly (P 005) increased asuranium increased (Fig 3A) Increased levels of abundance of 21 cytochrome (~46)and 6 hydrogenase (~73) gene variants were also observed specifically from well-known microorganisms like Geobacter Dechloromonas Enterobacter PseudomonasAlcaligenes Desulfovibrio Desulfitobacterium Rhodobacter Ochrobactrum and Anaeromyxo-bacter (Table S3) Also five abundant cytochrome genes (gi70733596 gi393759946

FIG 2 Linear relationships between the levels of abundance of specific functional gene families andlog-transformed Uranium (A to D) or nitrate (E to H) concentrations in groundwater including data fordsrA encoding the alpha subunit of sulfite reductase for dissimilatory sulfite reduction (A) sqr encodingsulfide-quinone reductase (B) cytochrome genes from well-known organisms eg Geobacter Anaero-myxobacter Dechloromonas Desulfovibrio Shewanella Desulfurobacterium Desulfobacterium Rhodobac-ter Pseudomonas Enterobacter and Ochrobactrum (C) hydrogenase genes from well-known organismseg Geobacter Desulfovibrio Desulfurobacterium Desulfobacterium and Rhodobacter (D) nirK encodingnitrite reductase for denitrification (E) nosZ encoding nitrous oxide reductase for denitrification (F)napA encoding nitrate reductase for dissimilatory nitrate reduction (G) and nasA encoding nitratereductase for assimilatory nitrate reduction (H)

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 5

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 2: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

functional gene richnessdiversity decreased as uranium increased in groundwaterwhile specific key microbial guilds increased significantly as uranium or nitrate in-creased These key microbial functional genes could be used to successfully predictenvironmental contamination and ecosystem functioning This study represents a signifi-cant advance in using functional gene markers to predict the spatial distribution ofenvironmental contaminants and ecosystem functioning toward predictive microbialecology which is an ultimate goal of microbial ecology

KEYWORDS groundwater microbiome random forest ecosystem functioningenvironmental contamination metagenomics microbial functional gene

Anthropogenic activities have impacted Earthrsquos biosphere through climate changecontamination of air water and soil environments introduction of invasive spe-

cies depletion of natural resources and alterations of biogeochemical cycling (1 2)These activities have reduced biodiversity destabilized ecosystem functions such ascarbon (C) and nitrogen (N) cycles and threatened human health (3ndash7) A recent studyshowed that several distinct factors such as concentrations of sulfate iron anddissolved CH4 and H2 might control the composition of groundwater microbiomes andthat the microbial functional diversity (FD) could explain groundwater chemistry in apristine aquifer (8) However the ecological consequences and mechanisms of envi-ronmental contamination in the biodiversity of microbial communities and ecosystemfunctioning remain largely unclear Even more challenging is to establish linkagesbetween microbial biodiversity and ecosystem functioning

It is generally believed that FD is better than taxonomic diversity (TD) andorphylogenetic diversity (PD) for predicting ecosystem functioning (9ndash12) For example arecent study across a gradient of sites from the subarctic to the tropics showed that areduction of decomposer FD consistently decreased the rate of litter decompositionand carbon and nitrogen cycling (13) However how to select molecular functionalpredictors (eg functional genes) remains a challenging question (11) Functional genearrays (eg GeoChip) target key genes involved in geochemical cycles bioremediationstress responses and other environmental processes and have been widely used tofunctionally profile microbial communities (14ndash19) Therefore GeoChip is an ideal toolto examine the impacts of environmental contaminants on groundwater microbiomes

The Oak Ridge Integrated Field Research Challenge (OR-IFRC) experimental sitelocated in Bear Creek Valley Oak Ridge TN is a legacy site for the early developmentof enriched uranium (U) under the Manhattan Project At this site numerous studieshave been conducted to examine the impact of contaminants on biological commu-nities and ecosystem functioning (20ndash29) For example a metagenome analysis ofFW106 a highly contaminated well showed that high relative levels of abundance ofkey genes encoding geochemical resistance functions were required for microbialsurvival in the presence of known environmental contaminants at the site (20) Also keyfunctional groups have been isolated and identified from the OR-IFRC site includingsulfate-reducing bacteria (SRB) nitrate-reducing bacteria (NRB) and metal-reducingbacteria (MRB) like Anaeromyxobacter Clostridium Desulfovibrio DesulfitobacteriumGeobacter Hyphomicrobium Intrasporangium Pseudomonas and Rhodanobacter spe-cies (20 23 25ndash29) Recently groundwater from 93 noncontaminated and contami-nated wells along the Bear Creek Valley at the OR-IFRC site were sampled Those wellshad a wide range of environmental gradients and associated ecosystem data (22) thusmaking it possible to use microbial community data for predicting groundwatercontamination The results showed that 16S rRNA gene-sequencing analysis of ground-water microbiomes could accurately identify environmental contaminants (eg ura-nium or nitrate) at the OR-IFRC site (22) However taxonomic information alone maynot be enough to reflect the functional aspects of microbial communities or ecosys-tems as not all members of a taxon may carry certain functional genes making itdifficult to predict geochemical properties especially ecosystem functioning Thefollowing important questions remain to be addressed (i) How does the functional

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 2

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

diversity of groundwater microbiomes change across a range of environmental gradi-ents (eg pH uranium and nitrate) (ii) What specific functional genespopulations arestimulated under high concentrations of uranium and nitrate (iii) Is it possible topredict environmental contamination (eg uranium or nitrate) and ecosystem func-tioning using microbial functional genes

In this study we hypothesized the following (i) FD would decrease with increasedenvironmental contamination (eg uranium or nitrate) or a significant change inenvironmental conditions (eg pH) (ii) under conditions of uranium and nitratecontamination the abundance of some key functional genespopulations (eg dsrAand cytochrome genes for uranium reduction or nirK and napA for nitrate reduction)would increase while the rest would decrease or remain unchanged and (iii) therelationship between FD and environmental contamination or ecosystem functioningwould be predictable based on key microbial functional genes To test those hypoth-eses we used a new version of a functional-gene microarray (GeoChip 50) to analyzegroundwater microbiomes from 69 wells at the OR-IFRC site GeoChip is able toquantitatively detect known microbial functions but generally does not target un-known functions from known or unknown microbial groups Our results indicate thatthe overall FD decreased as uranium (but not nitrate) concentrations increased or atlow or high pH however some specific functional genespopulations were stimulatedin response to uranium and nitrate contamination Such microbial functional genescould be used to successfully predict uranium and nitrate contamination and ecosys-tem functioning This study provides new insights for our understanding of the impactsof environmental contaminants on groundwater microbiomes and demonstrates thepredictive power of microbial functional genes for environmental contamination andecosystem functioning

RESULTSGeochemical properties and ecosystem function indicators A total of 38 envi-

ronmental variables were measured including pH contaminant (eg uranium andnitrate) concentrations dissolved gases (eg CO2 CH4 N2O and H2S) as ecosystemfunction indicators dissolved C and N and direct cell counts which were largely usedin this study (see Table S1 in the supplemental material) The 69 wells had wide rangesof uranium nitrate and pH levels with uranium at 0 to 553 mgliter (average15 mgliter) nitrate at 0 to 11648 mg NO3

-N (nitrate as nitrogen)liter (average 641mg NO3

-Nliter) and pHs of 3 to 105 (average pH 69) Furthermore wells with highconcentrations of uranium (eg 3 mgliter) also had high concentrations of nitrate(1516 to 11648 mg NO3

-Nliter) and low pHs (30 to 52) Also dissolved gases variedgreatly with CO2 at 0 to 29739 mgliter (average 476 mgliter) N2O at 0 to 12 mgliter(average 01 mgliter) CH4 at 0 to 06 mgliter (average close to 0 mgliter) andH2S at 0 to 42 mgliter (average 01 mgliter) The amounts of bacterial biomass ingroundwater samples ranged from 35 102 to 18 106 cellsml (average 12 105

cellsml) while the levels of dissolved organic carbon (DOC) were 02 to 1282 mgliter(average 78 mgliter) and the levels of dissolved inorganic carbon (DIC) were 94 to1792 mgliter (average 583 mgliter) Such large ranges of environmental gradientsprovide an advantage in testing the relationships between functional gene diversityand environmental contamination as well as ecosystem functioning

The relationships between functional richnessdiversityabundance microbialbiomass and contaminant concentrations For this study we defined functionalrichness as the number of functional genes detected by GeoChip 50 and functionaldiversity as the Shannon diversity index The overall levels of functional gene richnessand diversity decreased significantly (P 005) as uranium concentrations increasedThe functional diversity was highest at a neutral pH not under low- or high-pH conditionsbut it was not significantly (P 005) impacted by nitrate concentrations in ground-water (Fig 1 Fig S1) We further examined the relationships between the levels ofabundance of key gene families along the environmental gradients For example theabundances of sulfur (S) cycling genes (eg dsrA and sqr) and cytochrome and

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 3

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

hydrogenase genes decreased significantly (P 005) with increasing uranium con-centrations (Fig 2A to D) However the abundances of denitrification (eg nirK andnosZ) dissimilatory N reduction (eg napA) and assimilatory N reduction (eg nasA)genes did not decrease significantly (r 0125 to 0210 P 005) with increasingnitrate concentrations (Fig 2E to H) Further analysis of other key N cycling genesshowed significantly (P 005) decreased abundances with increased uranium or atlow or high pHs but no significant (P 005) correlations were observed between Ncycling gene abundances and nitrate concentrations (Table S2) In addition the effectsof uranium and pH on microbial biomass (measured by direct cell count) were notsignificant (P 005) nor was there a significant correlation (P 005) betweenbiomass and functional richness but it appeared that microbial biomass increasedsignificantly (P 0001) with increased nitrate concentrations suggesting that nitrateconsumers (eg nitrate reducers) may be dominant in the environment (Fig S2)Further analysis showed that the abundance of ~95 of genes detected by GeoChip50 decreased while only about 5 of them increased indicating that most of thefunctional genes were inhibited or remained unchanged as uranium and nitrateconcentrations increased

Key functional populations stimulated in response to a uranium gradientAlthough the richness and diversity of functional genes generally decreased as uraniumconcentrations increased in groundwater some specific populations of certain func-

FIG 1 Relationships between the overall functional richness and concentrations of uranium (A) andnitrate (B) as well as pH (C) in groundwater Uranium and nitrate concentrations were first logtransformed and then linear regressions were performed for functional richness and uranium or nitrateconcentrations Nonlinear regression was used for functional richness and pH

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 4

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

tional gliders did increase significantly (P 005) (Fig 3A and B Table S3) For examplethe abundance of 43 dsrA-bearing populations (~58 of total dsrA detected byGeoChip 5) mostly uncultured SRB with a few sequenced species (eg Halorhodospirahalophila Desulfobulbus propionicus Pelodictyon luteolum and Vibrio rotiferianus) in-creased significantly (P 005) (Table S3) In particular five abundant dsrA probesgenevariants (gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858) derivedfrom uncultured SRB were identified as being significantly (P 005) increased asuranium increased (Fig 3A) Increased levels of abundance of 21 cytochrome (~46)and 6 hydrogenase (~73) gene variants were also observed specifically from well-known microorganisms like Geobacter Dechloromonas Enterobacter PseudomonasAlcaligenes Desulfovibrio Desulfitobacterium Rhodobacter Ochrobactrum and Anaeromyxo-bacter (Table S3) Also five abundant cytochrome genes (gi70733596 gi393759946

FIG 2 Linear relationships between the levels of abundance of specific functional gene families andlog-transformed Uranium (A to D) or nitrate (E to H) concentrations in groundwater including data fordsrA encoding the alpha subunit of sulfite reductase for dissimilatory sulfite reduction (A) sqr encodingsulfide-quinone reductase (B) cytochrome genes from well-known organisms eg Geobacter Anaero-myxobacter Dechloromonas Desulfovibrio Shewanella Desulfurobacterium Desulfobacterium Rhodobac-ter Pseudomonas Enterobacter and Ochrobactrum (C) hydrogenase genes from well-known organismseg Geobacter Desulfovibrio Desulfurobacterium Desulfobacterium and Rhodobacter (D) nirK encodingnitrite reductase for denitrification (E) nosZ encoding nitrous oxide reductase for denitrification (F)napA encoding nitrate reductase for dissimilatory nitrate reduction (G) and nasA encoding nitratereductase for assimilatory nitrate reduction (H)

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 5

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 3: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

diversity of groundwater microbiomes change across a range of environmental gradi-ents (eg pH uranium and nitrate) (ii) What specific functional genespopulations arestimulated under high concentrations of uranium and nitrate (iii) Is it possible topredict environmental contamination (eg uranium or nitrate) and ecosystem func-tioning using microbial functional genes

In this study we hypothesized the following (i) FD would decrease with increasedenvironmental contamination (eg uranium or nitrate) or a significant change inenvironmental conditions (eg pH) (ii) under conditions of uranium and nitratecontamination the abundance of some key functional genespopulations (eg dsrAand cytochrome genes for uranium reduction or nirK and napA for nitrate reduction)would increase while the rest would decrease or remain unchanged and (iii) therelationship between FD and environmental contamination or ecosystem functioningwould be predictable based on key microbial functional genes To test those hypoth-eses we used a new version of a functional-gene microarray (GeoChip 50) to analyzegroundwater microbiomes from 69 wells at the OR-IFRC site GeoChip is able toquantitatively detect known microbial functions but generally does not target un-known functions from known or unknown microbial groups Our results indicate thatthe overall FD decreased as uranium (but not nitrate) concentrations increased or atlow or high pH however some specific functional genespopulations were stimulatedin response to uranium and nitrate contamination Such microbial functional genescould be used to successfully predict uranium and nitrate contamination and ecosys-tem functioning This study provides new insights for our understanding of the impactsof environmental contaminants on groundwater microbiomes and demonstrates thepredictive power of microbial functional genes for environmental contamination andecosystem functioning

RESULTSGeochemical properties and ecosystem function indicators A total of 38 envi-

ronmental variables were measured including pH contaminant (eg uranium andnitrate) concentrations dissolved gases (eg CO2 CH4 N2O and H2S) as ecosystemfunction indicators dissolved C and N and direct cell counts which were largely usedin this study (see Table S1 in the supplemental material) The 69 wells had wide rangesof uranium nitrate and pH levels with uranium at 0 to 553 mgliter (average15 mgliter) nitrate at 0 to 11648 mg NO3

-N (nitrate as nitrogen)liter (average 641mg NO3

-Nliter) and pHs of 3 to 105 (average pH 69) Furthermore wells with highconcentrations of uranium (eg 3 mgliter) also had high concentrations of nitrate(1516 to 11648 mg NO3

-Nliter) and low pHs (30 to 52) Also dissolved gases variedgreatly with CO2 at 0 to 29739 mgliter (average 476 mgliter) N2O at 0 to 12 mgliter(average 01 mgliter) CH4 at 0 to 06 mgliter (average close to 0 mgliter) andH2S at 0 to 42 mgliter (average 01 mgliter) The amounts of bacterial biomass ingroundwater samples ranged from 35 102 to 18 106 cellsml (average 12 105

cellsml) while the levels of dissolved organic carbon (DOC) were 02 to 1282 mgliter(average 78 mgliter) and the levels of dissolved inorganic carbon (DIC) were 94 to1792 mgliter (average 583 mgliter) Such large ranges of environmental gradientsprovide an advantage in testing the relationships between functional gene diversityand environmental contamination as well as ecosystem functioning

The relationships between functional richnessdiversityabundance microbialbiomass and contaminant concentrations For this study we defined functionalrichness as the number of functional genes detected by GeoChip 50 and functionaldiversity as the Shannon diversity index The overall levels of functional gene richnessand diversity decreased significantly (P 005) as uranium concentrations increasedThe functional diversity was highest at a neutral pH not under low- or high-pH conditionsbut it was not significantly (P 005) impacted by nitrate concentrations in ground-water (Fig 1 Fig S1) We further examined the relationships between the levels ofabundance of key gene families along the environmental gradients For example theabundances of sulfur (S) cycling genes (eg dsrA and sqr) and cytochrome and

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 3

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

hydrogenase genes decreased significantly (P 005) with increasing uranium con-centrations (Fig 2A to D) However the abundances of denitrification (eg nirK andnosZ) dissimilatory N reduction (eg napA) and assimilatory N reduction (eg nasA)genes did not decrease significantly (r 0125 to 0210 P 005) with increasingnitrate concentrations (Fig 2E to H) Further analysis of other key N cycling genesshowed significantly (P 005) decreased abundances with increased uranium or atlow or high pHs but no significant (P 005) correlations were observed between Ncycling gene abundances and nitrate concentrations (Table S2) In addition the effectsof uranium and pH on microbial biomass (measured by direct cell count) were notsignificant (P 005) nor was there a significant correlation (P 005) betweenbiomass and functional richness but it appeared that microbial biomass increasedsignificantly (P 0001) with increased nitrate concentrations suggesting that nitrateconsumers (eg nitrate reducers) may be dominant in the environment (Fig S2)Further analysis showed that the abundance of ~95 of genes detected by GeoChip50 decreased while only about 5 of them increased indicating that most of thefunctional genes were inhibited or remained unchanged as uranium and nitrateconcentrations increased

Key functional populations stimulated in response to a uranium gradientAlthough the richness and diversity of functional genes generally decreased as uraniumconcentrations increased in groundwater some specific populations of certain func-

FIG 1 Relationships between the overall functional richness and concentrations of uranium (A) andnitrate (B) as well as pH (C) in groundwater Uranium and nitrate concentrations were first logtransformed and then linear regressions were performed for functional richness and uranium or nitrateconcentrations Nonlinear regression was used for functional richness and pH

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 4

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

tional gliders did increase significantly (P 005) (Fig 3A and B Table S3) For examplethe abundance of 43 dsrA-bearing populations (~58 of total dsrA detected byGeoChip 5) mostly uncultured SRB with a few sequenced species (eg Halorhodospirahalophila Desulfobulbus propionicus Pelodictyon luteolum and Vibrio rotiferianus) in-creased significantly (P 005) (Table S3) In particular five abundant dsrA probesgenevariants (gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858) derivedfrom uncultured SRB were identified as being significantly (P 005) increased asuranium increased (Fig 3A) Increased levels of abundance of 21 cytochrome (~46)and 6 hydrogenase (~73) gene variants were also observed specifically from well-known microorganisms like Geobacter Dechloromonas Enterobacter PseudomonasAlcaligenes Desulfovibrio Desulfitobacterium Rhodobacter Ochrobactrum and Anaeromyxo-bacter (Table S3) Also five abundant cytochrome genes (gi70733596 gi393759946

FIG 2 Linear relationships between the levels of abundance of specific functional gene families andlog-transformed Uranium (A to D) or nitrate (E to H) concentrations in groundwater including data fordsrA encoding the alpha subunit of sulfite reductase for dissimilatory sulfite reduction (A) sqr encodingsulfide-quinone reductase (B) cytochrome genes from well-known organisms eg Geobacter Anaero-myxobacter Dechloromonas Desulfovibrio Shewanella Desulfurobacterium Desulfobacterium Rhodobac-ter Pseudomonas Enterobacter and Ochrobactrum (C) hydrogenase genes from well-known organismseg Geobacter Desulfovibrio Desulfurobacterium Desulfobacterium and Rhodobacter (D) nirK encodingnitrite reductase for denitrification (E) nosZ encoding nitrous oxide reductase for denitrification (F)napA encoding nitrate reductase for dissimilatory nitrate reduction (G) and nasA encoding nitratereductase for assimilatory nitrate reduction (H)

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 5

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 4: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

hydrogenase genes decreased significantly (P 005) with increasing uranium con-centrations (Fig 2A to D) However the abundances of denitrification (eg nirK andnosZ) dissimilatory N reduction (eg napA) and assimilatory N reduction (eg nasA)genes did not decrease significantly (r 0125 to 0210 P 005) with increasingnitrate concentrations (Fig 2E to H) Further analysis of other key N cycling genesshowed significantly (P 005) decreased abundances with increased uranium or atlow or high pHs but no significant (P 005) correlations were observed between Ncycling gene abundances and nitrate concentrations (Table S2) In addition the effectsof uranium and pH on microbial biomass (measured by direct cell count) were notsignificant (P 005) nor was there a significant correlation (P 005) betweenbiomass and functional richness but it appeared that microbial biomass increasedsignificantly (P 0001) with increased nitrate concentrations suggesting that nitrateconsumers (eg nitrate reducers) may be dominant in the environment (Fig S2)Further analysis showed that the abundance of ~95 of genes detected by GeoChip50 decreased while only about 5 of them increased indicating that most of thefunctional genes were inhibited or remained unchanged as uranium and nitrateconcentrations increased

Key functional populations stimulated in response to a uranium gradientAlthough the richness and diversity of functional genes generally decreased as uraniumconcentrations increased in groundwater some specific populations of certain func-

FIG 1 Relationships between the overall functional richness and concentrations of uranium (A) andnitrate (B) as well as pH (C) in groundwater Uranium and nitrate concentrations were first logtransformed and then linear regressions were performed for functional richness and uranium or nitrateconcentrations Nonlinear regression was used for functional richness and pH

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 4

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

tional gliders did increase significantly (P 005) (Fig 3A and B Table S3) For examplethe abundance of 43 dsrA-bearing populations (~58 of total dsrA detected byGeoChip 5) mostly uncultured SRB with a few sequenced species (eg Halorhodospirahalophila Desulfobulbus propionicus Pelodictyon luteolum and Vibrio rotiferianus) in-creased significantly (P 005) (Table S3) In particular five abundant dsrA probesgenevariants (gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858) derivedfrom uncultured SRB were identified as being significantly (P 005) increased asuranium increased (Fig 3A) Increased levels of abundance of 21 cytochrome (~46)and 6 hydrogenase (~73) gene variants were also observed specifically from well-known microorganisms like Geobacter Dechloromonas Enterobacter PseudomonasAlcaligenes Desulfovibrio Desulfitobacterium Rhodobacter Ochrobactrum and Anaeromyxo-bacter (Table S3) Also five abundant cytochrome genes (gi70733596 gi393759946

FIG 2 Linear relationships between the levels of abundance of specific functional gene families andlog-transformed Uranium (A to D) or nitrate (E to H) concentrations in groundwater including data fordsrA encoding the alpha subunit of sulfite reductase for dissimilatory sulfite reduction (A) sqr encodingsulfide-quinone reductase (B) cytochrome genes from well-known organisms eg Geobacter Anaero-myxobacter Dechloromonas Desulfovibrio Shewanella Desulfurobacterium Desulfobacterium Rhodobac-ter Pseudomonas Enterobacter and Ochrobactrum (C) hydrogenase genes from well-known organismseg Geobacter Desulfovibrio Desulfurobacterium Desulfobacterium and Rhodobacter (D) nirK encodingnitrite reductase for denitrification (E) nosZ encoding nitrous oxide reductase for denitrification (F)napA encoding nitrate reductase for dissimilatory nitrate reduction (G) and nasA encoding nitratereductase for assimilatory nitrate reduction (H)

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 5

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 5: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

tional gliders did increase significantly (P 005) (Fig 3A and B Table S3) For examplethe abundance of 43 dsrA-bearing populations (~58 of total dsrA detected byGeoChip 5) mostly uncultured SRB with a few sequenced species (eg Halorhodospirahalophila Desulfobulbus propionicus Pelodictyon luteolum and Vibrio rotiferianus) in-creased significantly (P 005) (Table S3) In particular five abundant dsrA probesgenevariants (gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858) derivedfrom uncultured SRB were identified as being significantly (P 005) increased asuranium increased (Fig 3A) Increased levels of abundance of 21 cytochrome (~46)and 6 hydrogenase (~73) gene variants were also observed specifically from well-known microorganisms like Geobacter Dechloromonas Enterobacter PseudomonasAlcaligenes Desulfovibrio Desulfitobacterium Rhodobacter Ochrobactrum and Anaeromyxo-bacter (Table S3) Also five abundant cytochrome genes (gi70733596 gi393759946

FIG 2 Linear relationships between the levels of abundance of specific functional gene families andlog-transformed Uranium (A to D) or nitrate (E to H) concentrations in groundwater including data fordsrA encoding the alpha subunit of sulfite reductase for dissimilatory sulfite reduction (A) sqr encodingsulfide-quinone reductase (B) cytochrome genes from well-known organisms eg Geobacter Anaero-myxobacter Dechloromonas Desulfovibrio Shewanella Desulfurobacterium Desulfobacterium Rhodobac-ter Pseudomonas Enterobacter and Ochrobactrum (C) hydrogenase genes from well-known organismseg Geobacter Desulfovibrio Desulfurobacterium Desulfobacterium and Rhodobacter (D) nirK encodingnitrite reductase for denitrification (E) nosZ encoding nitrous oxide reductase for denitrification (F)napA encoding nitrate reductase for dissimilatory nitrate reduction (G) and nasA encoding nitratereductase for assimilatory nitrate reduction (H)

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 5

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 6: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

gi157375053 gi394728887 and gi254982574) were significantly (P 005) increased asuranium concentrations increased in groundwater (Fig 3B) These stimulated popula-tions could play important roles in uranium bioremediation at this site

Key functional populations stimulated in response to a nitrate gradient Wealso found that the abundance of many specific functional genespopulations involvedin N cycling increased significantly (P 005) as nitrate increased (Fig 3C and DTable S4) For example the abundance of 13 nirK-bearing (49) populations increasedsignificantly (P 005) with most being uncultured bacteria and a few sequencedmicrobes (eg Chaetomium Arthroderma Nectria and Pseudomonas) the abundance of9 napA (60) gene variants for dissimilatory N reduction derived from BeggiatoaVibrio Campylobacter and Dinoroseobacter species as well as uncultured NRB alsoincreased significantly (P 005) as nitrate increased (Table S4) Five abundant nirKgene variants (gi116204223 gi256723237 gi46409951 gi73762878 and gi50541845)(Fig 3C) and five abundant napA gene variants (gi157285650 gi219549420 gi169793654

FIG 3 Significantly (P 005) positive correlations between the levels of abundance of stimulatedpopulations and log-transformed uranium (A and B) or nitrate (C and D) concentrations including datafor dsrA gene variants gi237846130 gi46308012 gi46307974 gi37726843 and gi46307858 derived fromuncultured sulfate-reducing bacteria (A) cytochrome genes gi70733596 from Pseudomonas fluorescensgi393759946 from Alcaligenes faecalis gi157375053 from Shewanella sediminis gi394728887 from En-terobacter sp and gi254982574 from Geobacter sp (B) nirK gene variants gi116204223 from Chaetomiumglobosum gi256723237 from Nectria haematococca and gi46409951 gi73762878 and gi50541845 fromuncultured denitrifying bacteria (C) and napA gene variants gi219549420 from Vibrio parahaemolyticusgi257458839 from Campylobacter gracilis gi157913465 from Dinoroseobacter shibae and gi157285650and gi169793654 from uncultured nitrate-reducing bacteria (D)

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 6

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 7: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

gi257458839 and gi157913465) increased significantly (P 005) as nitrate increased(Fig 3D) In addition populations stimulated by high concentrations of nitrate wereobserved for other N cycling genes such as amoA nifH narG nirS norB nasA nosZ andnrfA (Table S4) These stimulated populations are expected to play important roles inbioremediation of this nitrate-contaminated site

Prediction of uranium contamination in groundwater using microbial func-tional genes As significant relationships were observed between functional richnessdiversity andor populations and uranium concentrations in groundwater we at-tempted to predict groundwater contamination by the presence of microbial functionalgenes using random forest a machine learning method (30) First we selected a totalof 2361 of the functional genes detected that could predict uranium contamination onthe basis of being involved in S cycling and electron transfer (eg dsrA dsrB sircytochrome hydrogenase and cytochrome P-450 genes) Cross-validation by out-of-bagging (OOB) estimation of errors for classification of uranium contamination was2899 Second we selected a subset of 1521 specific functional genes from the firstset of 2361 genes for predicting uranium contamination including 892 dsrA 536cytochrome and 93 hydrogenase genes OOB estimation of errors was 2464 for allthree functional gene families and 2464 2609 and 2899 for dsrA cytochromeand hydrogenase genes respectively indicating that the best predictor for uraniumcontamination was dsrA or a combination of all three gene families each with an errorrate of 2464 Third we used the significantly changed populations bearing the bestpredictor dsrA (Table S3) and the same results were observed for uranium contami-nation prediction (Table 1) To further improve our prediction we used the area underthe receiver operating characteristic curve as the predictive accuracy for random forest(AUC-RF) (31) to automatically select 50 predictors (Table S5) from the initial 2361functional probes related to uranium reduction which dramatically decreased the OOBestimate of error rate from 2899 to 1159 (Table 1) These results indicated that

TABLE 1 Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at theOR-IFRC site using microbial functional genes as predictors

Contaminant Predictora

OOB errorrate ()

No of wells predictedno of wells defined

Background wellsb Contaminated wellsc

Uranium All S cycling and metal-related genes 2899 4747 222All dsrA cytochrome and hydrogenase genes 2464 4747 522All dsrA genes 2464 4747 522All cytochrome genes 2609 4647 522All hydrogenase genes 2899 4147 822Key dsrA cytochrome and hydrogenase genes 2754 4547 522Key dsrA genes 2464 4547 722Key cytochrome genes 3913 3847 422Key hydrogenase genes 4203 3347 722AUC-RF selection 1159 4747 1422

Nitrate All N cycling genes 3623 3944 525All nifH amoA narG nasA and napA genes 3478 4044 525All nifH genes 3333 4144 525All amoA genes 2754 4144 925All narG genes 3623 4044 425All nasA genes 3623 3744 725All napA genes 3478 4144 425Key nifH amoA narG nasA and napA genes 3043 4044 825Key nifH genes 2754 4144 925Key amoA genes 2899 3944 1025Key narG genes 3768 3744 625Key nasA genes 4058 3244 925Key napA genes 4058 3244 925AUC-RF selection 1594 4244 1625

aKey functional genes detected from each family are listed in Tables S3 and S4 in the supplemental materialbIn background wells the concentrations of uranium or nitrate were 30 gliter or below or 10 mgliter or below respectivelycIn contaminated wells the concentrations of uranium or nitrate were higher than 30 gliter or 10 mgliter respectively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 7

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 8: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

microbial functional genes were able to successfully predict groundwater uraniumcontamination

Prediction of nitrate contamination in groundwater using microbial functionalgenes Similarly we predicted nitrate contamination in groundwater First we selecteda total of 5273 functional genes involved in N cycling and showed that the error ratefor nitrate contamination prediction was 3623 Second we selected a subset of 2239specific functional genes from that first set that were involved in N fixation (1044 nifHgenes) nitrification (173 amoA genes) denitrification (705 narG genes) and assimilatory(134 nasA genes) and dissimilatory (183 napA genes) N reduction and the error rateswere 3479 for all the gene families selected and 3333 2754 3623 3623 and3478 respectively for individual functional gene families indicating that the bestpredictor for nitrate contamination was amoA with an error rate of 2754 Third weused the best predictor amoA and the significantly changed populations bearing it forthe same prediction and the error rate for nitrate contamination prediction was 2899(Table 1) which was not an improvement from the previous test To reduce thecollinearity we again used AUC-RF (31) to automatically select 54 predictors (Table S6)from the original 5273 N cycling genes This substantially improved our predictiondecreasing the OOB estimate of error rate to 1594 (Table 1) These results indicatedthat microbial functional genes were able to accurately predict nitrate contamination ingroundwater

Prediction of ecosystem functioning using microbial functional genes We alsoattempted to select specific microbial functional genes as well as 16S rRNA genes (fora comparison) to predict ecosystem functions that may be occurring based on theconcentrations of dissolved gases (eg CO2 CH4 and N2O) in the groundwater (Ta-ble S1) No significant correlations were observed either between the predicted CH4

concentration and the observed CH4 concentration or between the predicted CO2

concentration and the observed CO2 concentration (data not shown) However when16S rRNA genes N cycling genes all norB or nosZ genes key norB or nosZ genes all norBplus nosZ genes or key norB plus nosZ genes were used to predict N2O concentrationsin groundwater significant correlations between the predicted N2O concentration andthe observed N2O concentration were evident and among those sets of genes orcombinations of genes key norB plus nosZ genes or key nosZ genes were the bestpredictors for N2O concentrations in groundwater based on the r and P values of linearregressions (Fig 4) The results suggest that microbial functional genes are potentiallyuseful and better than 16S rRNA genes for predicting ecosystem functions (eg N2Oconcentrations in groundwater)

DISCUSSION

Understanding the impacts of contaminants on biological communities and pre-dicting the effects of those communities on ecosystem functioning are importanttopics in ecology and environmental management In this study we surveyed thefunctional diversity and composition of groundwater microbial communities and theirlinkages with environmental contamination or ecosystem functioning at the OR-IFRCexperimental site Our results showed that the overall functional diversityrichness ofgroundwater microbiomes decreased as uranium (but not nitrate) concentrationsincreased or at low or high pHs However some specific functional genespopulationsinvolved in uranium andor nitrate reduction and denitrification were stimulated andthese functional genes could be used to predict environmental contamination (eguranium or nitrate) and ecosystem functioning In addition unlike previous studieswhich only had a limited number of sampleswells this study analyzed 69 microbialcommunities from a large range of environmental gradients (eg uranium nitrate andpH) providing a more robust picture of the impact of human activities on biodiversityThe experimental results from this study generally support our hypotheses (with theexception of the relationship between nitrate and functional diversity)

Our first hypothesis was that the overall functional diversityrichness of groundwa-ter microbiomes would decrease with an increase in environmental contamination (eg

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 8

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 9: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

uranium or nitrate) or under extreme pH conditions A previous clone library analysis ofnirS and nirK genes from the same site found that novel nirK and nirS sequences werepresent in the contaminated groundwater and that the diversity of both gene familieschanged with contaminant (eg uranium or nitrate) concentrations (32) Also a com-parison of metagenomes from FW106 (a highly contaminated well) and FW301 (abackground well) revealed that long-term exposure to low pHs and high concentra-tions of uranium nitrate and organic solvents resulted in decreased species diversityand loss of functional diversity (20 24) Additionally GeoChip analysis of a landfillleachate-contaminated aquifer showed that leachate from an unlined landfill impactedthe diversity composition structure and functional potential of groundwater micro-biomes as a function of groundwater pH DOC and concentrations of sulfate andammonia (33) In this study we found that the overall functional diversity of ground-water microbial communities decreased under uranium contamination or extreme pHconditions which is consistent with previous observations in groundwater (20 32ndash36)as well as in the soil environment (37ndash40) Several possible mechanisms might beresponsible for such a reduction in the functional diversityrichness First most micro-organisms may not have developed efficient strategies for survivinggrowing in suchstressed environments so their abundances would decrease to below detection level oreven to extinction (20 24) Second if there are no appropriate mechanisms to deal withhigh uranium concentrations in the environment uranium may accumulate in or be

FIG 4 Random forest predictions of N2O concentrations in groundwater using different sets of genesincluding 16S rRNA genes (A) all N cycling genes (B) all norB and nosZ genes (C) key (significantlyincreaseddecreased) norB and nosZ genes (D) all norB genes (E) all nosZ genes (F) key norB genes (G)and key nosZ genes (H) All norB and nosZ key genes are listed in Table S4 in the supplemental material

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 9

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 10: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

deposited on the cell surface which could directly or indirectly inhibit specific keyfunctional genesenzymes as well as associated pathways (41) resulting in a decreasein functional richnessdiversity Third low pHs might reduce intracellular pH anddisrupt the chemiosmotic gradient (42) impairing cellular metabolism Fourth highconcentrations of uranium and nitrate and low pHs coexist in some wells (eg FW-021FW-106 FW-126 and FW-410) which may cause additive impacts further reducing theoverall functional diversityrichness These possibilities may lead to a decreased func-tional richnessdiversity of groundwater microbial communities However the func-tional richnessdiversity of certain specific gene families did not decrease significantlyas nitrate concentrations increased One possible explanation is that most microbes(eg nitrate reducers) might use nitrate or related N compounds (eg NO2

NO N2Oor NH4

) as electron donorsacceptors and sources of energy and assimilatory N sothat they were able to cope with such high nitrate concentrations Indeed a previousstudy indicated that elevated nitrate could stimulate microorganisms especially thosewith diverse metabolic capabilities (43) Therefore our results generally support thehypothesis that the overall functional richnessdiversity of groundwater microbialcommunities decreases as uranium concentrations increase or under extreme pHconditions in groundwater

Although the overall functional diversityrichness decreased as uranium concentra-tions increased or remained unchanged as nitrate concentrations increased some keyfunctional genespopulations involved in uranium or nitrate reductionresistancewould be expected to increase under high concentrations of uranium and nitrate ThedsrA gene encoding the alpha subunit of dissimilatory sulfite reductase an SRBbiomarker indicating the ability to reduce sulfate and heavy metals (eg uranium)(44ndash47) and cytochrome genes (48 49) were enriched Previous studies also indicatedthat some of these functional genespopulations were stimulated under conditions ofhigh concentrations of heavy metals (eg uranium and chromate) in this OR-IFRC site(50ndash53) the Uranium Mill Tailings Remedial Action site in Rifle CO (54) and thechromate-contaminated Hanford site (55) suggesting the important role of thesefunctions in metal (eg uranium and chromate) reduction As nitrate is an importantnutrient and electron acceptor for microorganisms adequately high concentrations ofnitrate in groundwater are expected to stimulate N cycling genes and associatedprocesses For example a recent study indicated that elevated nitrate could enrichfunctional genes involved in C N S and phosphorus (P) cycling thus leading to thepotential in situ bioremediation of polybrominated diphenyl ether (PBDE)- and poly-cyclic aromatic hydrocarbon (PAH)-contaminated sites (43) In the current study wefound that the abundances of about 5 to 6 dsrA cytochrome and N cycling geneswere positively correlated with the uranium or nitrate concentrations These geneswere largely derived from SRB NRB and MRB particularly those microorganisms withversatile metabolic capabilities (eg Rhodanobacter Geobacter Pseudomonas Alcali-genes Desulfovibrio Desulfitobacterium Rhodobacter and Anaeromyxobacter) Someof these key microorganisms have been isolated from the OR-IFRC site (23 25ndash29)and several key genes have been identified by shotgun metagenome sequencing(20 24) The results generally support our second hypothesis that key functionalgenespopulations involved in uranium reduction nitrate reduction and denitrifi-cation could be stimulated under high concentrations of uranium and nitrate Thesesignificantly increased or decreased functional genes or populations were used topredict uranium and nitrate contamination and ecosystem functioning in this studyas they are expected to play important roles in this groundwater system

Two recent studies compared different machine learning methods one aimed atfinding predictors of bacterial vaginosis (56) and the other at identifying environmentalsensors in groundwater contamination (22) and both showed that random forest wasa suitable approach for predictive analysis of microbial communities Another studyshowed that 16S rRNA gene sequencing data of human fecal communities were goodpredictors of a cityrsquos obesity level using random forest algorithms (57) Also 16S rRNAgene sequencing of fecal samples was used to distinguish pediatric patients with

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 10

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 11: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

inflammatory bowel disease (IBD) from patients with similar symptoms (58) At theOR-IFRC site a recent study found that 16S rRNA gene sequencing data could be usedto successfully predict most (26 out of 38) of the groundwater geochemical propertiessuch as uranium and nitrate concentrations and pHs (22) Although all these studiesused 16S rRNA genes as predictors it is believed that functional genes may be betterpredictors of ecosystem functions Currently some challenges remain in the use offunctional genes as predictors One challenge is to determine which functional genesor sets of functional genes are appropriate choices for given functions phenotypes(eg disease) or processes (eg CO2 production) and another challenge is to accu-rately identify or measure a specific phenotype or functional process

In this study our results indicated that uranium and nitrate contamination wereaccurately predicted specifically with AUC-RF (31) and we also successfully predicteddissolved N2O in groundwater However several challenges still remain in predictingother ecosystem functions such as CO2 and CH4 concentrations in groundwater Firstonly a few wells had relatively high concentrations of CH4 or CO2 while most wells hadundetectable concentrations of these gases in the groundwater Such a skewed distri-bution of data may affect our prediction accuracy Second the high diversity offunctional genespopulations may present multiple instances of collinearity in thecommunity thus compromising our predictions Indeed when we used AUC-RF toreduce collinearity the prediction error rates decreased dramatically from approxi-mately 29 to 12 for uranium contamination and from 36 to 16 for nitratecontamination Third it is hard to identify the specific functional genes responsible forsome general functional processes For example groundwater CO2 could be generatedfrom many C decomposition pathways and other physical or chemical pathways orconsumed by autotrophy and chemical reactions making it difficult to select specificgenes for predicting this functional process and thus limiting the predictive powerFourth the relationship between dissolved gases and functional gene abundance maybe subtle The concentrations of gases in groundwater may not accurately reflectecosystem functioning or functional gene abundance may not reflect actual activityPerhaps due to these challenges a recent study also showed that adding functionalinformation did not improve classification accuracy (59) Therefore to accurately pre-dict ecosystem functioning more studies need to be conducted to optimize methodsselect appropriate functional predictors reduce skewed sample distribution decreasemultiple incidences of collinearity andor increase the reliability of ecosystem func-tional process data

Conclusions Our results indicated that the overall functional richnessdiversitydecreased with increased uranium (but not nitrate) concentrations or at low or highpHs Some specific functional genespopulations were stimulated under high concen-trations of uranium or nitrate and could be used to successfully predict uranium andnitrate contamination and potentially ecosystem functioning This study provides newinsights for our understanding of the impacts of environmental contaminants on thefunctional richnessdiversity of groundwater microbiomes and demonstrates the pre-dictive power of microbial functional genes to identify environmental contaminationand ecosystem functioning

MATERIALS AND METHODSMore detailed descriptions of the site sampling methods physical geochemical and microbiological

measurements groundwater biomass collection DNA extraction and random forest analysis was pro-vided previously (22)

Site description and sampling The US Department of Energyrsquos (DOE) Oak Ridge Integrated FieldResearch Challenge (OR-IFRC) site has a 243-acre contaminated area and a 402-acre uncontaminatedbackground area located within the Bear Creek Valley watershed in Oak Ridge TN This site has beencontaminated with radionuclides (eg uranium and technetium) nitrate sulfide and volatile organiccompounds The major source of contamination is the former S-3 waste disposal ponds within the Y-12national security complex which has been continuously monitored and documented over the pastseveral decades (25 60) Further information regarding the plume and sources of contamination can befound at httpspublicornlgovorifcorfrc1_fieldchallengecfm

Physical geochemical and microbiological measurements In this study 93 groundwater wellswere carefully selected to cover the maximum geochemical diversity of this site without exhaustively

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 11

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 12: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

sampling all available wells However we were only able to obtain enough DNA from 69 wells forGeoChip analysis (see Table S1 in the supplemental material) Groundwater samples were collectedfrom the OR-IFRC experimental site between November 2012 and February 2013 A variety ofphysical geochemical and microbiological properties were measured on site or in the laboratory aspreviously described (22) a brief summary follows (i) Bulk water parameters including temperaturepH dissolved oxygen (DO) conductivity and redox were measured at the wellhead using an In-SituTroll 9500 sensor (In-Situ Inc Fort Collins CO) (ii) Dissolved gases including He H2 N2 O2 CO CO2CH4 and N2O were measured on an SRI 8610C gas chromatograph with argon carrier gas using amethod derived from EPA RSK-175 and USGS Reston Chlorofluorocarbon Laboratory procedures (iii)Dissolved organic carbon (DOC) and inorganic carbon (DIC) concentrations were determined with aShimadzu TOC-V CSH analyzer (Tokyo Japan) (iv) Anions including bromide chloride nitratephosphate and sulfate were determined using a Dionex 2100 with an AS9 column and carbonateeluent (v) Concentrations of metals (and trace elements) in the groundwater were determined onan inductively coupled plasma-mass spectrometry (ICP-MS) instrument (Elan 6100) (61) Finally (vi)the amounts of bacterial biomass in groundwater samples were determined using the acridineorange direct count (AODC) method (62)

Groundwater biomass collection DNA extraction and template preparation Microbial biomasswas collected and DNA extracted as described previously (11) Briefly 40 liters of groundwater wasfiltered through 02-m filters to collect biomass Filters containing biomass were placed into 50-mlFalcon tubes immediately stored on dry ice transferred to the laboratory and stored at 80degC until DNAextraction DNA was extracted and purified using a modification of the Miller method (62)

GeoChip hybridization and data preprocessing The GeoChip 50 microarray chip contains 167044distinct functional gene probes covering 395894 coding sequences (CDS) from ~1600 functional genefamilies involved in microbial carbon (eg degradation methane metabolism and fixation) and nitrogen(eg nitrification denitrification reduction and fixation) cycling electron transfer organic remediationsecondary metabolism stress responses and virulence To obtain sufficient DNA for microarray analysis10 ng of template DNA from each sample was amplified using whole-community genome amplification(WCGA) (63) After amplification 25 g of DNA was labeled resuspended in hybridization buffer andhybridized on a GeoChip 50 microarray chip with 10 formamide at 67degC for 24 h in an Agilentmicroarray hybridization oven (Agilent Technologies Santa Clara CA) The array was then washed driedand scanned at 100 laser power at wavelengths of 532 nm and 635 nm Intensity data were collectedusing the Agilent Feature Extraction program Raw intensity data were uploaded to the Functional GeneMicroarray analysis pipeline (httpieg2oueduAgilent) for preprocessing including normalization andlog transformation

GeoChip data analysis The preprocessed GeoChip data and environmental variables were used forfurther statistical analyses including (i) diversity and evenness indexes of microbial communities aspreviously described (16) (ii) linear and nonlinear regressions between measures of functional genediversityabundances of selected genes and geochemical properties by SigmaPlot (Systat Software IncSan Jose CA) and (iii) linear regressions between each probe (normalized signal intensity profile acrossall samples) and environmental variables and calculations of slopes and R2 and P values using R (64)

Random forest for predicting environmental contamination and ecosystem functioning Ran-dom forest was used for classification and regression as it does not require extensive tuning and recentstudies have demonstrated that it is a suitable tool in microbial community analysis (22 58 65) Thismethod included three major steps feature selection modeling (classification or regression) and errorrate estimation by out-of-bag (OOB) data

(i) Feature selection Different sets of functional genes were selected as features for predictingenvironmental (uranium and nitrate) contamination and ecosystem functioning (eg N2O) includingrelated functional gene categories (eg all N cycling genes) specific functional gene families (eg norBor nosZ) and key functional genes that were significantly increased or decreased as contaminationincreased For the classification of environmental (uranium and nitrate) contamination we also used thereceiver operating characteristic curve and the area under the curve (AUC) as the predictive accuracy forrandom forest (RF) and then selected the set of features with the highest AUC values termed AUC-RF(31) thus reducing the multiple collinearity among features An AUC of around 05 indicates that theclassification is only as good as a random guess while the classification is perfect if the AUC is 10 Thiswas performed by using the R package AUCRF

(ii) Modeling The random forest models were constructed using the R package ldquorandomForestrdquo asdescribed by Leo Breiman (66) The algorithm is briefly summarized below First bootstrap sampleswere drawn from the original data n times Second for each set of bootstrap samples an unprunedclassification or regression tree was grown and at each node rather than choosing the best split amongall features we randomly sampled the mtry (number of features randomly sampled as candidates at eachsplit) of the features and chose the best split among those features By default mtry equals one-third thenumber of all features Third new data were predicted by aggregating the predictions of n trees (iemajority votes for classification and averages for regression)

(iii) Error rate estimation The estimate of the error rate was obtained without independent testdata sets At each bootstrap iteration the data not included in the bootstrap samples also known asout-of-bag (OOB) data were used for prediction with the tree constructed from the bootstrap samplesThen the error rate was calculated by aggregating the OOB predictions to obtain the OOB estimate oferror rate

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 12

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 13: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at httpsdoiorg101128mBio

02435-17FIG S1 TIF file 07 MBFIG S2 TIF file 05 MBTABLE S1 DOCX file 003 MBTABLE S2 DOCX file 001 MBTABLE S3 DOCX file 003 MBTABLE S4 DOCX file 01 MBTABLE S5 DOCX file 002 MBTABLE S6 DOCX file 002 MB

ACKNOWLEDGMENTSThis material by ENIGMA (Ecosystems and Networks Integrated with Genes and

Molecular Assemblies [httpenigmalblgov]) a Scientific Focus Area Program at Law-rence Berkeley National Laboratory is based upon work supported by the US Depart-ment of Energy Office of Science Office of Biological and Environmental Researchunder contract number DE-AC02-05CH11231 and by funding from the ThousandTalents Program (grant number 38000-18821105) to Zhili He through Sun Yat-SenUniversity China

REFERENCES1 Vitousek PM Mooney HA Lubchenco J Melillo JM 1997 Human dom-

ination of Earthrsquos ecosystems Science 277494 ndash 499 httpsdoiorg101126science2775325494

2 Halpern BS Walbridge S Selkoe KA Kappel CV Micheli F DrsquoAgrosa CBruno JF Casey KS Ebert C Fox HE Fujita R Heinemann D Lenihan HSMadin EMP Perry MT Selig ER Spalding M Steneck R Watson R 2008A global map of human impact on marine ecosystems Science 319948 ndash952 httpsdoiorg101126science1149345

3 Sahney S Benton MJ Ferry PA 2010 Links between global taxonomicdiversity ecological diversity and the expansion of vertebrates on landBiol Lett 6544 ndash547 httpsdoiorg101098rsbl20091024

4 May RM 1988 How many species are there on Earth Science 2411441ndash1449 httpsdoiorg101126science24148721441

5 Worm B Barbier EB Beaumont N Duffy JE Folke C Halpern BS JacksonJBC Lotze HK Micheli F Palumbi SR Sala E Selkoe KA Stachowicz JJWatson R 2006 Impacts of biodiversity loss on ocean ecosystem ser-vices Science 314787ndash790 httpsdoiorg101126science1132294

6 Vitousek PM Aber JD Howarth RW Likens GE Matson PA Schindler DWSchlesinger WH Tilman DG 1997 Human alteration of the global nitro-gen cycle sources and consequences Ecol Appl 7737ndash750 httpsdoiorg1018901051-0761(1997)007[0737HAOTGN]20CO2

7 Canadell JG Ciais P Dhakal S Dolman H Friedlingstein P Gurney KRHeld A Jackson RB Le Queacutereacute C Malone EL Ojima DS Patwardhan APeters GP Raupach MR 2010 Interactions of the carbon cycle humanactivity and the climate system a research portfolio Curr Opin EnvironSustain 2301ndash311 httpsdoiorg101016jcosust201008003

8 Flynn TM Sanford RA Ryu H Bethke CM Levine AD Ashbolt NJ SantoDomingo JW 2013 Functional microbial diversity explains groundwaterchemistry in a pristine aquifer BMC Microbiol 13146 httpsdoiorg1011861471-2180-13-146

9 Flynn DFB Mirotchnick N Jain M Palmer MI Naeem S 2011 Functionaland phylogenetic diversity as predictors of biodiversitymdash ecosystem-function relationships Ecology 921573ndash1581 httpsdoiorg10189010-12451

10 Petchey OL Gaston KJ 2006 Functional diversity back to basics andlooking forward Ecol Lett 9741ndash758 httpsdoiorg101111j1461-0248200600924x

11 Krause S Le Roux X Niklaus PA Van Bodegom PM Lennon JT BertilssonS Grossart H-P Philippot L Bodelier PLE 2014 Trait-based approachesfor understanding microbial biodiversity and ecosystem functioningFront Microbiol 5251 httpsdoiorg103389fmicb201400251

12 Cardinale BJ Matulich KL Hooper DU Byrnes JE Duffy E Gamfeldt LBalvanera P OrsquoConnor MI Gonzalez A 2011 The functional role of

producer diversity in ecosystems Am J Bot 98572ndash592 httpsdoiorg103732ajb1000364

13 Handa IT Aerts R Berendse F Berg MP Bruder A Butenschoen O ChauvetE Gessner MO Jabiol J Makkonen M McKie BG Malmqvist B Peeters ETHMScheu S Schmid B van Ruijven J Vos VCA Haumlttenschwiler S 2014 Conse-quences of biodiversity loss for litter decomposition across biomes Nature509218ndash221 httpsdoiorg101038nature13247

14 He Z Gentry TJ Schadt CW Wu L Liebich J Chong SC Huang Z Wu WGu B Jardine P Criddle C Zhou J 2007 GeoChip a comprehensivemicroarray for investigating biogeochemical ecological and environ-mental processes ISME J 167ndash77 httpsdoiorg101038ismej20072

15 He Z Deng Y Van Nostrand JD Tu Q Xu M Hemme CL Li X Wu LGentry TJ Yin Y Liebich J Hazen TC Zhou J 2010 GeoChip 30 as ahigh-throughput tool for analyzing microbial community compositionstructure and functional activity ISME J 41167ndash1179 httpsdoiorg101038ismej201046

16 He Z Xu MY Deng Y Kang SH Kellogg L Wu LY Van Nostrand JDHobbie SE Reich PB Zhou JZ 2010 Metagenomic analysis reveals amarked divergence in the structure of belowground microbial commu-nities at elevated CO2 Ecol Lett 13564 ndash575 httpsdoiorg101111j1461-0248201001453x

17 Tu Q Yu H He Z Deng Y Wu L Van Nostrand JD Zhou A VoordeckersJ Lee YJ Qin Y Hemme CL Shi Z Xue K Yuan T Wang A Zhou J 2014GeoChip 4 a functional gene-array-based high-throughput environmen-tal technology for microbial community analysis Mol Ecol Resour 14914ndash928 httpsdoiorg1011111755-099812239

18 He Z Deng Y Zhou J 2012 Development of functional gene microarraysfor microbial community analysis Curr Opin Biotechnol 2349 ndash55httpsdoiorg101016jcopbio201111001

19 He Z Van Nostrand JD Zhou J 2012 Applications of functional genemicroarrays for profiling microbial communities Curr Opin Biotechnol23460 ndash 466 httpsdoiorg101016jcopbio201112021

20 Hemme CL Deng Y Gentry TJ Fields MW Wu L Barua S Barry K TringeSG Watson DB He Z Hazen TC Tiedje JM Rubin EM Zhou J 2010Metagenomic insights into evolution of a heavy metal-contaminatedgroundwater microbial community ISME J 4660 ndash 672 httpsdoiorg101038ismej2009154

21 Zhou J Deng Y Zhang P Xue K Liang Y Van Nostrand JD Yang Y HeZ Wu L Stahl DA Hazen TC Tiedje JM Arkin AP 2014 Stochasticitysuccession and environmental perturbations in a fluidic ecosystemProc Natl Acad Sci U S A 111E836 ndashE845 httpsdoiorg101073pnas1324044111

22 Smith MB Rocha AM Smillie CS Olesen SW Paradis C Wu L CampbellJH Fortney JL Mehlhorn TL Lowe KA Earles JE Phillips J Techtmann

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 13

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 14: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

SM Joyner DC Elias DA Bailey KL Hurt RA Preheim SP Sanders MCYang J Mueller MA Brooks S Watson DB Zhang P He Z Dubinsky EAAdams PD Arkin AP Fields MW Zhou J Alm EJ Hazen TC 2015 Naturalbacterial communities serve as quantitative geochemical biosensorsmBio 6e00326-15 httpsdoiorg101128mBio00326-15

23 Akob DM Mills HJ Gihring TM Kerkhof L Stucki JW Anastaacutecio AS ChinKJ Kuumlsel K Palumbo AV Watson DB Kostka JE 2008 Functional diversityand electron donor dependence of microbial populations capable ofU(VI) reduction in radionuclide-contaminated subsurface sedimentsAppl Environ Microbiol 743159 ndash3170 httpsdoiorg101128AEM02881-07

24 Hemme CL Tu Q Shi Z Qin Y Gao W Deng Y Van Nostrand JD Wu LHe Z Chain PSG Tringe SG Fields MW Rubin EM Tiedje JM Hazen TCArkin AP Zhou J 2015 Comparative metagenomics reveals impact ofcontaminants on groundwater microbiomes Front Microbiol 61205httpsdoiorg103389fmicb201501205

25 Green SJ Prakash O Jasrotia P Overholt WA Cardenas E Hubbard DTiedje JM Watson DB Schadt CW Brooks SC Kostka JE 2012 Denitri-fying bacteria from the genus Rhodanobacter dominate bacterial com-munities in the highly contaminated subsurface of a nuclear legacywaste site Appl Environ Microbiol 781039 ndash1047 httpsdoiorg101128AEM06435-11

26 Kostka JE Green SJ Rishishwar L Prakash O Katz LS Marintildeo-Ramiacuterez LJordan IK Munk C Ivanova N Mikhailova N Watson DB Brown SDPalumbo AV Brooks SC 2012 Genome sequences for six Rhodanobacterstrains isolated from soils and the terrestrial subsurface with variabledenitrification capabilities J Bacteriol 1944461ndash 4462 httpsdoiorg101128JB00871-12

27 Bollmann A Palumbo AV Lewis K Epstein SS 2010 Isolation andphysiology of bacteria from contaminated subsurface sediments ApplEnviron Microbiol 767413ndash7419 httpsdoiorg101128AEM00376-10

28 Fields MW Yan T Rhee SK Carroll SL Jardine PM Watson DB Criddle CSZhou J 2005 Impacts on microbial communities and cultivable isolatesfrom groundwater contaminated with high levels of nitric acid-uraniumwaste FEMS Microbiol Ecol 53417ndash428 httpsdoiorg101016jfemsec200501010

29 Cardenas E Wu WM Leigh MB Carley J Carroll S Gentry T Luo J WatsonD Gu B Ginder-Vogel M Kitanidis PK Jardine PM Zhou J Criddle CSMarsh TL Tiedje JM 2010 Significant association between sulfate-reducing bacteria and uranium-reducing microbial communities as re-vealed by a combined massively parallel sequencing-indicator speciesapproach Appl Environ Microbiol 766778 ndash 6786 httpsdoiorg101128AEM01097-10

30 Liaw A Wiener M 2002 Classification and regression by randomForestR News 218 ndash22

31 Calle ML Urrea V Boulesteix AL Malats N 2011 AUC-RF a new strategyfor genomic profiling with random forest Hum Hered 72121ndash132 httpsdoiorg101159000330778

32 Yan T Fields MW Wu L Zu Y Tiedje JM Zhou J 2003 Molecular diversityand characterization of nitrite reductase gene fragments (nirK and nirS)from nitrate- and uranium-contaminated groundwater Environ Micro-biol 513ndash24 httpsdoiorg101046j1462-2920200300393x

33 Lu Z He Z Parisi VA Kang S Deng Y Van Nostrand JD Masoner JRCozzarelli IM Suflita JM Zhou J 2012 GeoChip-based analysis of micro-bial functional gene diversity in a landfill leachate-contaminated aquiferEnviron Sci Technol 465824 ndash5833 httpsdoiorg101021es300478j

34 Tiago I Veriacutessimo A 2013 Microbial and functional diversity of a sub-terrestrial high pH groundwater associated to serpentinization EnvironMicrobiol 151687ndash1706 httpsdoiorg1011111462-292012034

35 Roadcap GS Sanford RA Jin Q Pardinas JR Bethke CM 2006 Extremelyalkaline (pH 12) ground water hosts diverse microbial communityGround Water 44511ndash517 httpsdoiorg101111j1745-6584200600199x

36 Meacutendez-Garciacutea C Pelaacuteez AI Mesa V Saacutenchez J Golyshina OV Ferrer M2015 Microbial diversity and metabolic networks in acid mine drain-age habitats Front Microbiol 6475 httpsdoiorg103389fmicb201500475

37 Zhalnina K Dias R de Quadros PD Davis-Richardson A Camargo FAClark IM McGrath SP Hirsch PR Triplett EW 2015 Soil pH determinesmicrobial diversity and composition in the Park Grass experiment Mi-crob Ecol 69395ndash 406 httpsdoiorg101007s00248-014-0530-2

38 Fierer N Jackson RB 2006 The diversity and biogeography of soilbacterial communities Proc Natl Acad Sci U S A 103626 ndash 631 httpsdoiorg101073pnas0507535103

39 Lauber CL Hamady M Knight R Fierer N 2009 Pyrosequencing-basedassessment of soil pH as a predictor of soil bacterial community struc-ture at the continental scale Appl Environ Microbiol 755111ndash5120httpsdoiorg101128AEM00335-09

40 Liang Y Zhao H Zhang X Zhou J Li G 2014 Contrasting microbialfunctional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites Sci Total Environ 487272ndash278 httpsdoiorg101016jscitotenv201404032

41 Antunes SC Pereira R Marques SM Castro BB Gonccedilalves F 2011 Impairedmicrobial activity caused by metal pollution a field study in a deactivateduranium mining area Sci Total Environ 410 ndash 41187ndash95 httpsdoiorg101016jscitotenv201109003

42 Bearson S Bearson B Foster JW 1997 Acid stress responses in entero-bacteria FEMS Microbiol Lett 147173ndash180 httpsdoiorg101111j1574-69681997tb10238x

43 Xu M Zhang Q Xia C Zhong Y Sun G Guo J Yuan T Zhou J He Z 2014Elevated nitrate enriches microbial functional genes for potential biore-mediation of complexly contaminated sediments ISME J 81932ndash1944httpsdoiorg101038ismej201442

44 Lovley DR Phillips EJP 1992 Reduction of uranium by Desulfovibriodesulfuricans Appl Environ Microbiol 58850 ndash 856

45 Lovley DR Phillips EJP 1994 Reduction of chromate by Desulfovibriovulgaris and its c3 cytochrome Appl Environ Microbiol 60726 ndash728

46 Tebo BM Obraztsova AY 1998 Sulfate-reducing bacterium grows withCr(VI) U(VI) Mn(IV) and Fe(III) as electron acceptors FEMS Microbiol Lett162193ndash198 httpsdoiorg101111j1574-69681998tb12998x

47 Suzuki Y Kelly SD Kemner KM Banfield JF 2003 Microbial populationsstimulated for hexavalent uranium reduction in uranium mine sedimentAppl Environ Microbiol 691337ndash1346 httpsdoiorg101128AEM6931337-13462003

48 Payne RB Gentry DM Rapp-Giles BJ Casalot L Wall JD 2002 Uraniumreduction by Desulfovibrio desulfuricans strain G20 and a cytochrome c3

mutant Appl Environ Microbiol 683129 ndash3132 httpsdoiorg101128AEM6863129-31322002

49 Lovley DR Widman PK Woodward JC Phillips EJP 1993 Reduction ofuranium by cytochrome c3 of Desulfovibrio vulgaris Appl Environ Micro-biol 593572ndash3576

50 Xu M Wu WM Wu L He Z Van Nostrand JD Deng Y Luo J Carley JGinder-Vogel M Gentry TJ Gu B Watson D Jardine PM Marsh TL TiedjeJM Hazen T Criddle CS Zhou J 2010 Responses of microbial commu-nity functional structures to pilot-scale uranium in situ bioremediationISME J 41060 ndash1070 httpsdoiorg101038ismej201031

51 Zhang P Wu W-M Van Nostrand JD Deng Y He Z Gihring T Zhang GSchadt CW Watson D Jardine P Criddle CS Brooks S Marsh TL TiedjeJM Arkin AP Zhou J 2015 Dynamic succession of groundwater func-tional microbial communities in response to emulsified vegetable oilamendment during sustained in situ U(VI) reduction Appl Environ Mi-crobiol 814164 ndash 4172 httpsdoiorg101128AEM00043-15

52 Van Nostrand JD Wu L Wu W-M Huang Z Gentry TJ Deng Y Carley JCarroll S He Z Gu B Luo J Criddle CS Watson DB Jardine PM Marsh TLTiedje JM Hazen TC Zhou J 2011 Dynamics of microbial communitycomposition and function during in situ bioremediation of a uranium-contaminated aquifer Appl Environ Microbiol 773860 ndash3869 httpsdoiorg101128AEM01981-10

53 Van Nostrand JD Wu WM Wu L Deng Y Carley J Carroll S He Z Gu BLuo J Criddle CS Watson DB Jardine PM Marsh TL Tiedje JM Hazen TCZhou J 2009 GeoChip-based analysis of functional microbial commu-nities during the reoxidation of a bioreduced uranium-contaminatedaquifer Environ Microbiol 112611ndash2626 httpsdoiorg101111j1462-2920200901986x

54 Liang Y Van Nostrand JD NrsquoGuessan LA Peacock AD Deng Y Long PEResch CT Wu LY He ZL Li GH Hazen TC Lovley DR Zhou JZ 2012Microbial functional gene diversity with a shift of subsurface redoxconditions during in situ uranium reduction Appl Environ Microbiol782966 ndash2972 httpsdoiorg101128AEM06528-11

55 Zhang P Van Nostrand JD He Z Chakraborty R Deng Y Curtis D FieldsMW Hazen TC Arkin AP Zhou J 2015 A slow-release substrate stimu-lates groundwater microbial communities for long-term in situ Cr(VI)reduction Environ Sci Technol 4912922ndash12931 httpsdoiorg101021acsest5b00024

56 Beck D Foster JA 2014 Machine learning techniques accurately classifymicrobial communities by bacterial vaginosis characteristics PLoS One9e87830 httpsdoiorg101371journalpone0087830

57 Newton RJ McLellan SL Dila DK Vineis JH Morrison HG Eren AM Sogin

He et al reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 14

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES
Page 15: Microbial Functional Gene Diversity Predicts Groundwater ...and carbon and nitrogen cycling (13). However, how to select molecular functional predictors (e.g., functional genes) remains

ML 2015 Sewage reflects the microbiomes of human populations mBio6e02574-14 httpsdoiorg101128mBio02574-14

58 Papa E Docktor M Smillie C Weber S Preheim SP Gevers D GiannoukosG Ciulla D Tabbaa D Ingram J Schauer DB Ward DV Korzenik JR XavierRJ Bousvaros A Alm EJ 2012 Non-invasive mapping of the gastroin-testinal microbiota identifies children with inflammatory bowel diseasePLoS One 7e39242 httpsdoiorg101371journalpone0039242

59 Xu Z Malmer D Langille MGI Way SF Knight R 2014 Which is moreimportant for classifying microbial communities who is there or whatthey can do ISME J 82357ndash2359 httpsdoiorg101038ismej2014157

60 Green SJ Prakash O Gihring TM Akob DM Jasrotia P Jardine PMWatson DB Brown SD Palumbo AV Kostka JE 2010 Denitrifying bac-teria isolated from terrestrial subsurface sediments exposed to mixed-waste contamination Appl Environ Microbiol 763244 ndash3254 httpsdoiorg101128AEM03069-09

61 Thorgersen MP Lancaster WA Vaccaro BJ Poole FL Rocha AM Mehl-horn T Pettenato A Ray J Waters RJ Melnyk RA Chakraborty R HazenTC Deutschbauer AM Arkin AP Adams MWW 2015 Molybdenum avail-ability is key to nitrate removal in contaminated groundwater environ-ments Appl Environ Microbiol 814976 ndash 4983 httpsdoiorg101128AEM00917-15

62 Hazen TC Dubinsky EA DeSantis TZ Andersen GL Piceno YM Singh NJansson JK Probst A Borglin SE Fortney JL Stringfellow WT Bill MConrad ME Tom LM Chavarria KL Alusi TR Lamendella R Joyner DCSpier C Baelum J Auer M Zemla ML Chakraborty R Sonnenthal ELDrsquohaeseleer P Holman HY Osman S Lu Z Van Nostrand JD Deng YZhou J Mason OU 2010 Deep-sea oil plume enriches indigenousoil-degrading bacteria Science 330204 ndash208 httpsdoiorg101126science1195979

63 Wu L Liu X Schadt CW Zhou J 2006 Microarray-based analysis ofsubnanogram quantities of microbial community DNAs by using whole-community genome amplification Appl Environ Microbiol 724931ndash 4941 httpsdoiorg101128AEM02738-05

64 R Core Team 2014 R a language and environment for statistical com-puting R Foundation for Statistical Computing Vienna Austria

65 Metcalf JL Wegener Parfrey L Gonzalez A Lauber CL Knights D Ack-ermann G Humphrey GC Gebert MJ Van Treuren W Berg-Lyons DKeepers K Guo Y Bullard J Fierer N Carter DO Knight R 2013 Amicrobial clock provides an accurate estimate of the postmortem inter-val in a mouse model system eLife 2e01104 httpsdoiorg107554eLife01104

66 Breiman L 2001 Random forests Mach Learn 455ndash32 httpsdoiorg101023A1010933404324

Microbial Genes Predict Ecosystem Functioning reg

JanuaryFebruary 2018 Volume 9 Issue 1 e02435-17 mbioasmorg 15

on August 26 2020 by guest

httpmbioasm

orgD

ownloaded from

  • RESULTS
    • Geochemical properties and ecosystem function indicators
    • The relationships between functional richnessdiversityabundance microbial biomass and contaminant concentrations
    • Key functional populations stimulated in response to a uranium gradient
    • Key functional populations stimulated in response to a nitrate gradient
    • Prediction of uranium contamination in groundwater using microbial functional genes
    • Prediction of nitrate contamination in groundwater using microbial functional genes
    • Prediction of ecosystem functioning using microbial functional genes
      • DISCUSSION
        • Conclusions
          • MATERIALS AND METHODS
            • Site description and sampling
            • Physical geochemical and microbiological measurements
            • Groundwater biomass collection DNA extraction and template preparation
            • GeoChip hybridization and data preprocessing
            • GeoChip data analysis
            • Random forest for predicting environmental contamination and ecosystem functioning
            • (i) Feature selection
            • (ii) Modeling
            • (iii) Error rate estimation
              • SUPPLEMENTAL MATERIAL
              • ACKNOWLEDGMENTS
              • REFERENCES