53
Supporting Information Predicting taxonomic and functional structure of microbial community in acid mine drainage Jialiang Kuang, Linan Huang, Zhili He, Linxing Chen, Zhengshuang Hua, Pu Jia, Shengjin Li, Jun Liu, Jintian Li, Jizhong Zhou and Wensheng Shu Supplementary Methods Sampling procedure, physicochemical analyses and DNA extraction Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes Processing of pyrosequencing data GeoChip analysis Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional genes that selected for the statistical analyses in this study. Table S2 Site locations and environmental conditions of acid mine drainage (AMD) samples. Table S3 Relative abundance (%) of dominant lineages across AMD microbial communities. Table S4 Summary of statistics (R 2 ) from dissimilarity test (Adonis) between two mining areas on the functional community structure. Table S5 Environmental and taxonomic variable loadings on the PCs across the AMD samples. Table S6 Multiple linear regression (MLR) of environmental variables and relative abundance of dominant microbial lineages on metabolic potential of functional genes. Table S7 Validation of predictive models for relative abundances of dominant microbial taxa (Phylum level, mean relative abundance > 1%) based on the artificial neural network (ANN). Table S8 Validation of predictive models for relative abundances of dominant microbial taxa (Order level, mean relative abundance > 0.1%) based on the artificial neural network (ANN). Table S9 Validation of predictive models for relative abundances of key microbial taxa (OTU level, observed in at least half of the total samples) based on the artificial neural network (ANN). Table S10a Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN).

Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Supporting Information

Predicting taxonomic and functional structure of microbial community in acid mine drainage

Jialiang Kuang, Linan Huang, Zhili He, Linxing Chen, Zhengshuang Hua, Pu Jia, Shengjin Li, Jun Liu,

Jintian Li, Jizhong Zhou and Wensheng Shu

Supplementary

Methods

Sampling procedure, physicochemical analyses and DNA extraction

Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes

Processing of pyrosequencing data

GeoChip analysis

Prediction model of microbial assemblages and functional metabolic potentials

Table S1 Functional genes that selected for the statistical analyses in this study.

Table S2 Site locations and environmental conditions of acid mine drainage (AMD) samples.

Table S3 Relative abundance (%) of dominant lineages across AMD microbial communities.

Table S4 Summary of statistics (R

2) from dissimilarity test (Adonis) between two mining

areas on the functional community structure.

Table S5 Environmental and taxonomic variable loadings on the PCs across the AMD samples.

Table S6

Multiple linear regression (MLR) of environmental variables and relative

abundance of dominant microbial lineages on metabolic potential of functional

genes.

Table S7

Validation of predictive models for relative abundances of dominant microbial

taxa (Phylum level, mean relative abundance > 1%) based on the artificial neural

network (ANN).

Table S8

Validation of predictive models for relative abundances of dominant microbial

taxa (Order level, mean relative abundance > 0.1%) based on the artificial neural

network (ANN).

Table S9

Validation of predictive models for relative abundances of key microbial taxa

(OTU level, observed in at least half of the total samples) based on the artificial

neural network (ANN).

Table S10a Validation of predictive models for metabolic potentials (original signals) of key

functional genes based on the artificial neural network (ANN).

Page 2: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b Validation of predictive models for metabolic potentials (normalized data) of key

functional genes based on the artificial neural network (ANN).

Table S11

Predictive equations and functional parameters that provide the best prediction for

relative abundances of dominant microbial taxa based on the artificial neural

network (ANN).

Table S12 Predictive equations and functional parameters that provide the best prediction for

functional metabolic potentials based on ANN.

Table S13 Predictive equations and functional parameters that provide the best prediction for

environmental properties based on ANN.

Table S14 Functional genes that reveled consistent or fluctuant relative metabolic potentials

along the gradient of pH levels.

Figure S1 The consensus networks of environmental (a) and taxonomic (b) variables

generated by Bayesian network inference.

Figure S2 The scatter plots show the cross-validation of predicted and observed values for

relative microbial abundances at different taxonomic levels.

Figure S3 The scatter plots show the cross-validation of predicted and observed values for

functional metabolic potentials of different functional gene categories.

Figure S4

Bray-Curtis similarity between predicted and observed values of relative microbial

abundances (phylum level, a) and gene metabolic potentials of different functional

categories (with relative abundance information of microbial phyla, b).

Figure S5

The changes of relative metabolic potential of functional genes in sulfur cycling

(a), stress response (b), energy process and membrane transport (c) and antibiotic

resistance (d) along the gradient of pH levels.

Figure S6

The comparison of predicted and observed metabolic potentials of different

functional gene categories including carbon cycling (a), phosphorus (b) and sulfur

cycling (c) along the gradient of pH levels.

Figure S7 The comparison of predicted and observed metabolic potentials of nitrogen cycling

along the gradient of pH levels.

Figure S8

The comparison of predicted and observed metabolic potentials of different

functional gene categories including energy process (a) and membrane transport

(b) along the gradient of pH levels.

Figure S9 The comparison of predicted and observed metabolic potentials of metal resistance

along the gradient of pH levels.

Figure S10 The comparison of predicted and observed metabolic potentials of stress response

along the gradient of pH levels.

Figure S11 The comparison of predicted and observed metabolic potentials of antibiotic

resistance along the gradient of pH levels.

Page 3: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Supplementary Methods

Sampling procedure, physicochemical analyses and DNA extraction

Acid mine drainage (AMD) samples were previously collected from 14 mining areas across Southeast

China with different mineralogy and the sampling sites ranged from about 10 m to over 1600 km

(Kuang et al., 2013). Briefly, water samples were taken using sterile serum bottles and immediately 5

kept on ice for transport to the laboratory. For DNA extraction, each sample of 500 ml water was

coarse filtered through a 3 μm fiber filter and then filtered through a 0.22 μm polyethersulfone (PES)

membrane filter. The cell pellets on the PES membranes were used for DNA extraction by following

the protocol described by Frias-Lopez et al. (2008) with an additional homogenizing step for cell lysis

using Fast Prep-24 Homogenisation System, and the filtrates were used for the chemical analyses. 10

Temperature, solution pH, dissolved oxygen (DO) and electrical conductivity (EC) were measured on

site by use of specific electrodes. Ferric and ferrous irons were measured by ultraviolet-colorimetric

assay with 1,10-phenanthroline at 530 nm. Total organic carbon (TOC) was measured by high-

temperature catalytic oxidation and infrared detection with a TOC analyzer and sulfate determined by a

BaSO4-based turbidimetric method. The element analysis was performed by inductively-coupled 15

optical emission spectrometry (ICP-OES) after the filtrates were digested at 180 oC with conc. HNO3

and HCl (1:3, v/v).

Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes

PCR amplification, purification, pooling, and pyrosequencing of a region of the 16S rRNA gene were 20

performed following the procedure described by Fierer et al. (2008). The primer set F515 (5’-

GTGCCAGCMGCCGCGGTAA-3’, with an 8-bp error-correcting tag (Hamady et al., 2008)) and

R806 (5’-GGACTACVSGGGTATCTAAT-3’) was used to amplify the V4 hypervariable region.

Page 4: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Triplicate PCR reactions for each sample were amplified, pooled and purified. Finally, all PCR

products were combined with approximately equimolar amounts and sequenced by a 454 GS FLX 25

Titanium pyrosequencer.

Processing of pyrosequencing data

Raw data generated from the 454-pyrosequencing run were processed and analyzed following the

pipelines of Mothur (Schloss et al., 2009) and QIIME (Caporaso et al., 2010). Pyrosequences were 30

denoised using the commands of ‘shhh.flows’ (translation of PyroNoise algorithm; Quince et al., 2009)

and ‘pre.cluster’ (Huse et al., 2010) in Mothur platform. Chimeric sequences were identified and

removed using UCHIME with de novo method (Edgar et al., 2011). Quality sequences were

subsequently assigned to samples according to their unique 8-bp barcode and binned into phylotypes

using average clustering algorithm (Huse et al., 2010) at the 97% similarity level. Taxonomic 35

classification of phylotypes was determined based on the Ribosomal Database Project at the 80%

threshold (Wang et al., 2007). The relative abundance (%) of individual taxa was estimated within

each community by comparing the number of sequences assigned to a specific taxon versus the number

of total sequences obtained for that sample.

40

GeoChip analysis

The general pipeline of DNA labeling, GeoChip processing and data normalization was described

previously (He et al., 2007). Specifically, to obtain sufficient amounts of genomic DNA for the

hybridization, whole-community genome amplification (WCGA) (TempliPhi Amplification kit,

Amersham Biosciences, Piscataway, NJ) was conducted using approximately 1.0 ng of community 45

DNA from each sample following the procedure of Wu et al. (2006). Notably, appropriate

Page 5: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

manipulation of community DNAs was necessary in applying microarray-based genomic technology

especially for samples with very low microbial biomass like AMD. Although previous report showed

that WCGA could produce significant biases in community composition (Bodelier et al., 2009), our

previous experimental study indicated that the amplification procedure we used here was in a 50

representative and quantitative fashion (Wu et al., 2006). Thus, the biases of WCGA in this study may

not significantly affect the actual functional structure. Equal amounts of amplified DNA (1.0 μg) were

then used for GeoChip 4.0 hybridization as previously described (Lu et al., 2012; Chan et al., 2013).

Signal intensity was normalized by the average control dye across samples and spots with signal-to-

noise ratio [SNR = (signal intensity - background)/standard deviation of background] greater than 2 55

were considered as positive signals for further analysis (He et al., 2007).

Prediction model of microbial assemblages and functional metabolic potentials

The modeling approach developed by Larsen et al. (2012) was applied for the prediction of microbial

assemblages and functional metabolic potentials. In this study, the dynamics of microbial community 60

structure and signal intensity of functional genes were modeled respectively. Since our results

indicated that the patterns of microbial community composition and functional gene structure were

largely determined by environmental conditions, thus the prediction of microbial assemblage and

functional metabolic potential were performed according to the environmental properties. Also, the

biotic interactions between different microbial taxa or relevant genes involved in the same functional 65

subcategory were incorporated into the modeling. Additionally, because of the potential influence of

relative microbial abundances to metabolic potentials that observed in other analyses of this study, we

constructed models of functional metabolic potentials with and without these microbial interactions.

Environmental variables, relative abundances of dominant microbial lineages and/or the signal

intensities of functional genes were merged as input matrixes, and the relationships between the 70

Page 6: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

variables were estimated using Bayesian network inference with Java Objects (BANJO v2.2.0) (Smith

et al., 2006; Larsen et al., 2012). The networks generated by the Bayesian network inference were

directed acyclical graphs (DAGs), in which nodes were environmental parameters, microbial taxa or

functional genes. The directed edges of these DAGs revealed the relationships between nodes, and a

change in the value of a parent node has a significant conditional dependence on a change in value of a 75

child node (Larsen et al., 2012). In this study, the maximum number of parents in BANJO was set as

three, and the simulated annealing and the AllLocalMoves proposer were used with randomly

configured networks. The top-10 highest-scoring networks were subsequently used to generate the

consensus network.

The relationships revealed by the consensus network could be expressed as a set of formulas such that 80

the value of every node is a function of the value of its parent nodes. Finally, these functions were

derived using Eureqa v 0.99.9 beta software (Schmidt and Lipson, 2009). The operations including

constant, addition, subtraction, multiplication and division were permitted in equations. In the formula

search, data from 30 randomly selected samples were used for training, while the remaining samples

(10 samples) were used for validation (see below). The best-fitting equations were searched for 2 CPU 85

hours, and not all of the parents (if more than one parent for a given node) will be essentially

incorporated into the generated equations that best fit the observed data. All the possible solutions were

effectively ranked according to the Pearson’s correlation coefficients. The final equation that selected

for the prediction was defined by the following optimality criteria: choice of equations that best fitting

an obvious peak or drop in observed data; highest correlation with observed data; with more function 90

parameters; with the fewest terms (Larsen et al., 2012). After the generation and selection of final

formula that trained by data from 30 samples, the data of the remaining 10 samples were imported to

validate this equation. Additionally, since only a few taxa are consistently of high relative abundance

and many taxa are consistently of low relative abundance, it is possible to get deceptively high

Page 7: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

correlations between predicted and observed values so long as the model correctly identifies the small 95

number of high abundance taxa (Larsen et al., 2015). Therefore, two null models were performed to

test whether the predicted model has better correlation with biological observation than these null

models: i) setting all taxa's predicted relative abundance/metabolic potentials equal to the average taxa

abundance/metabolic potentials across all samples, ii) setting all taxa abundances/metabolic potentials

equal to the minimum observed values across all samples (Larsen et al., 2015). 100

References

Bodelier PLE, Kamst M, Meima-Franke M, Stralis-Pavese N, Bodrossy L. (2009). Whole-community

genome amplification (WCGA) leads to compositional bias in methane-oxidizing communities as

assessed by pmoA-based microarray analyses and QPCR. Environ Microbiol Rep 1: 434-441. 105

Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al. (2010). QIIME

allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.

Chan Y, Van Nostrand JD, Zhou J, Pointing SB, Farrell RL. (2013). Functional ecology of an

Antarctic Dry Valley. Proc Natl Acad Sci USA 110: 8990-8995.

Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011). UCHIME improves sensitivity and 110

speed of chimera detection. Bioinformatics 27: 2194-2200.

Fierer N, Hamady M, Lauber CL, Knight R. (2008). The influence of sex, handedness, and washing on

the diversity of hand surface bacteria. Proc Natl Acad Sci USA 105: 17994-17999.

Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW et al. (2008). Microbial

community gene expression in ocean surface waters. Proc Natl Acad Sci USA 105: 3805-3810. 115

Page 8: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R. (2008). Error-correcting barcoded primers for

pyrosequencing hundreds of samples in multiplex. Nat Methods 5: 235-237.

He Z, Gentry TJ, Schadt CW, Wu L, Liebich J, Chong SC et al. (2007). GeoChip: a comprehensive

microarray for investigating biogeochemical, ecological and environmental processes. ISME J 1:

67-77. 120

Huse SM, Welch DM, Morrison HG, Sogin ML. (2010). Ironing out the wrinkles in the rare biosphere

through improved OTU clustering. Environ Microbiol 12: 1889-1898.

Kuang JL, Huang LN, Chen LX, Hua ZS, Li SJ, Hu M et al. (2013). Contemporary environmental

variation determines microbial diversity patterns in acid mine drainage. ISME J 7: 1038-1050.

Larsen PE, Dai Y, Collart FR. (2015). Predicting bacterial community assemblages using an artificial 125

neural network approach. Meth Mol Bio 1260: 33-43.

Larsen PE, Field D, Gilbert JA. (2012). Predicting bacterial community assemblages using an artificial

neural network approach. Nat Methods 9: 621-625.

Lu Z, Deng Y, Van Nostrand JD, He Z, Voordeckers J, Zhou A et al. (2012). Microbial gene functions

enriched in the Deepwater Horizon deep-sea oil plume. ISME J 6: 451-460. 130

Quince C, Lanzen A, Curtis TP, Davenport, RJ, Hall N, Head IM et al. (2009). Noise and the accurate

determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6: 639-641.

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al. (2009). Introducing

mothur: open-source, platform-independent, community-supported software for describing and

comparing microbial communities. Appl Environ Microbiol 75: 7537-7541. 135

Schmidt M, Lipson H. (2009). Distilling free-form natural laws from experimental data. Science 324:

81-85.

Page 9: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Smith VA, Yu J, Smulders TV, Hartemink AJ, Jarvis ED. (2006). Computational inference of neural

information flow networks. PLoS Comput Biol 2: e161.

Wang Q, Garrity GM, Tiedje JM, Cole JR. (2007). Naive Bayesian classifier for rapid assignment of 140

rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-5267.

Wu L, Liu X, Schadt CW, Zhou J. (2006). Microarray-based analysis of subnanogram quantities of

microbial community DNAs by using whole-community genome amplification. Appl Environ

Microbiol 72: 4931-4941.

145

Page 10: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S1. Functional genes that selected for the statistic analyses in this study.

Category Subcategory Gene Abbreviations

Carbon cycling Carbon fixation

aclB aclB

CODH CODH

Pcc Pcc

RubisCo RubisCo

Nitrogen cycling

Nitrogen fixation nifH nifH

Ammonification

(mineralization)

gdh gdh

ureC ureC

Nitrification amoA amoA

Denitrification

narG narG

nirK nirK

nirS nirS

norB norB

nosZ nosZ

Assimilatory N reduction

nasA nasA

NiR NiR

nirA nirA

nirB nirB

Dissimilatory N reduction napA napA

nrfA nrfA

Phosphorus Phosphorus utilization

phytase phytase

ppk ppk

ppx ppx

Sulfur cycling

Adenylylsulfate reductase aprA aprA

aprB aprB

Sulfite reductase dsrA dsrA

dsrB dsrB

Sulfur oxidation sox sox

Metal resistance

Ag

silA silA

silC silC

silP silP

Al al al

As

aoxB aoxB

arsA arsA

arsB arsB

arsC arsC

arsM arsM

Cd cadA cadA

cadBD cadBD

Cd_Co_Zn

czcA czcA

czcC czcC

czcD czcD

Co corC corC

Cr chrA chrA

Cu

copA copA

cueO cueO

cusA cusA

Hg

mer mer

merB merB

merP merP

Ni nreB nreB

Pb pbrA pbrA

pbrT pbrT

Te

tehB tehB

terC terC

terD terD

terZ terZ

Zn zitB zitB

zntA zntA

Energy process

Electron transport

Fe-S cluster binding protein fes

ferredoxin fer

ferredoxin oxidoreductase fero

NADH ubiquinone oxidoreductase NADH

terminal quinol oxidase quio

Cytochrome cytochrome cyt

Hydrogenase hydrogenase hyd

Ni-Fe hydrogenase Nfhyd

Page 11: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S1. Functional genes that selected for the statistic analyses in this study (continued).

Category Subcategory Gene Abbreviations

Membrane transport EPS glycosyl transferase glyt

other category ABC transporter ABCt

Stress response

Cold cspA cspA

cspB cspB

Heat

dnaK dnaK

groEL groEL

groES groES

grpE grpE

hrcA hrcA

Glucose limitation bglH bglH

bglP bglP

Nitrogen limitation glnA glnA

glnR glnR

Oxygen limitation

arcA arcA

arcB arcB

cydA cydA

cydB cydB

narH narH

narI narI

narJ narJ

Oxygen stress

ahpC ahpC

ahpF ahpF

fnr fnr

katA katA

katE katE

oxyR oxyR

perR perR

Osmotic stress proV proV

proX proX

Phosphate limitation

phoA phoA

phoB phoB

pstA pstA

pstB pstB

pstC pstC

pstS pstS

Protein stress clpC clpC

ctsR ctsR

Radiation stress obgE obgE

Antibiotic resistance

Transporter

ABC antibiotic transporter ABCat

MatE antibiotics MatE

MFS antibiotics MFS

SMR antibiotics SMR

Mex Mex

Beta-lactamases

beta-lactamase lac

class A beta-lactamase lacA

class C beta-lactamase lacC

other category Tet Tet

Van Van

Page 12: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples.

Sample

ID Location Mining area

Latitude

(N)

Logitude

(E) pH EC DO TOC SO4

2- Fe3+ Fe2+ Al As Cd Cu Pb Zn

NS Maanshan, Anhui AHMAS 31.64 118.62 4.1 3224 2.2 6.0 1319 1 0 0 0.00 0.00 0.5 0.00 0

JGS1 Tongling, Anhui AHTL 30.90 117.83 2.0 20000 0.9 67.0 7530 29283 589 2531 136.51 6.03 1028.0 1.60 1834

JGS2 Tongling, Anhui AHTL 30.90 117.83 2.2 16259 1.1 19.0 7443 3570 6 1891 64.86 7.22 699.0 0.92 1469

XSC1 Tongling, Anhui AHTL 30.91 117.89 2.9 2908 1.4 2.2 712 42 2 9 0.00 0.04 2.9 0.83 47

XSC3 Tongling, Anhui AHTL 30.90 117.90 2.9 4342 2.1 6.8 2852 219 10 90 0.00 0.00 12.0 0.14 3

YSC1 Tongling, Anhui AHTL 30.90 117.90 2.3 5113 2.5 7.5 4579 721 35 174 14.90 0.00 19.0 0.12 41

YSC2 Tongling, Anhui AHTL 30.90 117.83 2.2 6794 0.4 13.0 5931 1664 25 157 62.11 0.43 52.0 0.22 97

ZJ1 Zijin, Fujian FJZJ 25.19 116.38 2.0 16770 4.6 12.0 6823 3183 32 1297 10.41 0.00 268.0 0.50 82

ZJ2 Zijin, Fujian FJZJ 25.20 116.38 2.9 970 3.1 2.9 842 7 0 54 0.00 0.00 36.0 0.21 7

ZJ3 Zijin, Fujian FJZJ 25.18 116.37 3.5 134 4.4 13.0 22 0 0 5 0.00 0.00 0.2 0.01 0

ZJ8 Zijin, Fujian FJZJ 25.18 116.38 3.4 1093 6.4 3.1 813 1 0 32 0.00 0.00 18.0 0.07 3

DBS1 Dabaoshan, Guangdong GDDBS 24.52 113.72 2.6 2850 5.2 2.5 3469 427 9 168 0.00 0.02 6.3 0.67 144

DBS3 Dabaoshan, Guangdong GDDBS 24.57 113.72 2.5 3610 5.0 2.7 4632 559 7 132 0.00 0.00 16.0 0.10 27

FK1 Fankou, Guangdong GDFK 25.05 113.66 1.9 5890 4.8 10.0 6173 2541 252 53 0.74 0.32 0.0 0.26 427

YF1 Yunfu, Guangdong GDYF 22.97 112.01 2.4 2290 4.0 6.3 2785 281 147 114 0.00 0.00 0.0 0.33 0

YF2 Yunfu, Guangdong GDYF 22.97 112.01 2.3 3450 15.0 5.4 4268 1019 346 117 0.00 0.00 0.0 0.50 0

YF3 Yunfu, Guangdong GDYF 22.97 112.01 2.1 9220 9.0 13.0 7085 7686 2561 1675 0.45 0.00 0.0 0.88 408

YF4 Yunfu, Guangdong GDYF 22.97 112.01 2.4 3930 2.0 3.6 5747 1490 331 266 0.00 0.00 0.0 0.78 62

YF5 Yunfu, Guangdong GDYF 22.97 112.01 2.5 3830 3.0 3.3 5611 1439 453 265 0.00 0.00 0.0 0.69 59

YF7 Yunfu, Guangdong GDYF 22.98 112.01 2.7 4910 11.0 5.0 6823 328 241 1878 0.40 0.00 0.1 0.55 256

YF8 Yunfu, Guangdong GDYF 22.99 112.01 2.6 3910 3.5 2.4 5863 251 8 0 0.09 0.00 0.1 0.46 0

Page 13: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples (continued).

Sample

ID Location Mining area

Latitude

(N)

Longitude

(E) pH EC DO TOC SO4

2- Fe3+ Fe2+ Al As Cd Cu Pb Zn

DC1 Dachang, Guangxi GXDC 24.86 107.58 2.7 3820 1.3 1.9 4031 890 145 0 0.00 0.00 0.0 0.12 38

DC2 Dachang, Guangxi GXDC 24.86 107.58 3.0 2350 0.9 1.6 1845 63 2 0 0.00 0.00 0.0 0.16 25

DC3 Dachang, Guangxi GXDC 24.86 107.58 2.8 2980 0.9 3.4 3144 973 202 16 0.13 0.00 0.1 0.10 46

DC5 Dachang, Guangxi GXDC 24.82 107.58 3.1 890 0.8 1.5 531 22 0 0 0.00 0.00 0.0 0.02 0

DC7 Dachang, Guangxi GXDC 24.85 107.57 2.5 5080 0.3 16.0 6028 2547 509 48 29.46 0.81 3.1 0.78 263

DC8 Dachang, Guangxi GXDC 24.85 107.57 2.5 4930 0.4 13.0 6018 2775 417 50 38.34 2.67 4.6 0.37 589

PD1 Puding, Guizhou GZPD 26.58 105.72 3.0 3300 2.2 ND 3062 265 5 81 0.00 0.00 0.8 0.06 127

PD3 Puding, Guizhou GZPD 26.48 105.89 2.5 3510 1.9 14.0 3600 499 4 111 0.00 0.00 1.2 0.06 16

PD4 Puding, Guizhou GZPD 26.48 105.89 3.0 4300 1.0 2.8 5155 708 327 232 0.00 0.00 0.1 0.12 0

PD7 Puding, Guizhou GZPD 26.47 105.87 2.9 2880 1.8 2.2 2873 184 6 62 0.00 0.00 0.9 0.04 0

SL Shilu, Hainan HNSL 19.24 109.04 2.8 3155 1.2 3.3 699 150 9 8 0.00 0.00 6.7 0.11 0

DX1 Dexing, Jiangxi JXDX 29.01 117.73 2.0 3690 1.3 14.0 2766 506 5 124 0.00 0.00 19.0 0.05 0

DX2 Dexing, Jiangxi JXDX 29.01 117.73 1.9 10330 0.8 48.0 6997 2451 271 1601 0.48 0.00 33.0 0.41 0

DX3 Dexing, Jiangxi JXDX 29.01 117.73 1.9 10200 1.2 55.0 6687 2573 240 1572 0.34 0.00 33.0 0.42 0

YP1 Yongping, Jiangxi JXYP 28.21 117.77 2.7 4430 1.3 15.0 4685 91 19 321 0.00 0.00 25.0 0.17 16

YP2 Yongping, Jiangxi JXYP 28.20 117.76 2.4 4390 1.4 14.0 4331 262 4 80 0.00 0.45 54.0 0.10 50

YP3 Yongping, Jiangxi JXYP 28.20 117.76 2.1 5740 1.1 6.2 5412 915 12 154 0.00 1.85 90.0 0.16 89

YP4 Yongping, Jiangxi JXYP 28.20 117.76 2.7 2740 1.3 3.1 2611 47 1 70 0.00 2.00 85.0 2.40 199

YP5 Yongping, Jiangxi JXYP 28.20 117.76 2.6 4510 1.4 16.0 4021 205 6 58 0.00 2.27 44.0 0.13 123

All values are in mg L-1

, except pH, Latitude, Longitude (in standard units) and EC (in μS cm-1

).

EC: eletrical conductivity. DO: dissolved oxygen. TOC: total organic carbon.

ND, not determined.

Page 14: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities.

Sample

ID Euryarchaeota

Alpha- Beta- Gamma- Nitrospira Firmicutes

Actino- Acido- Others Unclassified

proteobacteria proteobacteria proteobacteria bacteria bacteria

NS 0.48 12.93 77.76 2.52 2.38 0.48 0.34 0.00 3.06 0.07

JGS1 0.60 0.65 0.05 41.69 47.56 6.77 0.15 0.05 2.49 0.00

JGS2 0.82 9.45 1.45 82.17 3.72 0.25 0.19 0.06 1.89 0.00

XSC1 0.01 1.50 95.96 2.26 0.21 0.00 0.02 0.00 0.04 0.00

XSC3 0.13 0.67 87.16 10.95 0.45 0.24 0.07 0.02 0.31 0.00

YSC1 10.15 43.18 3.88 7.80 27.35 0.69 0.03 0.75 4.54 1.63

YSC2 17.55 13.80 0.07 19.18 39.63 5.24 0.07 0.28 2.48 1.70

ZJ1 3.82 2.32 0.00 20.11 10.62 4.64 0.31 0.00 57.97 0.21

ZJ2 0.16 10.40 21.33 18.39 5.28 6.13 6.59 8.92 22.73 0.08

ZJ3 0.00 14.78 33.29 11.55 0.37 8.32 2.36 3.98 25.34 0.00

ZJ8 0.00 6.09 24.54 23.62 2.03 10.70 1.85 1.11 30.07 0.00

DBS1 0.21 3.22 17.37 70.73 1.32 0.13 0.08 0.81 6.10 0.02

DBS3 0.16 18.66 74.29 4.38 0.90 0.09 0.03 1.15 0.31 0.03

FK1 32.33 0.42 0.07 18.12 27.99 0.42 0.42 0.42 10.15 9.66

YF1 2.36 0.26 83.28 5.18 2.56 0.52 0.26 0.07 5.44 0.07

YF2 4.57 0.00 60.62 8.15 12.64 1.52 0.08 0.46 11.88 0.08

YF3 11.10 0.00 0.40 41.27 18.37 1.61 0.10 0.00 27.04 0.10

YF4 1.23 2.03 54.34 39.51 0.86 0.31 0.00 0.06 1.60 0.06

YF5 0.97 2.59 57.20 37.30 0.57 0.24 0.00 0.00 1.05 0.08

YF7 8.21 1.79 67.68 8.41 4.17 3.11 0.46 0.66 5.17 0.33

YF8 0.06 26.67 67.36 1.35 3.43 0.06 0.00 0.96 0.11 0.00

Page 15: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities (continued).

Sample

ID Euryarchaeota

Alpha- Beta- Gamma- Nitrospira Firmicutes

Actino- Acido- Others Unclassified

proteobacteria proteobacteria proteobacteria bacteria bacteria

DC1 3.76 10.94 35.21 12.31 2.22 2.05 1.71 1.20 30.26 0.34

DC2 5.40 2.61 42.51 2.96 0.17 0.35 0.35 0.70 43.73 1.22

DC3 0.09 15.82 76.14 1.07 0.09 0.00 0.00 0.09 6.70 0.00

DC5 0.41 0.34 94.87 1.98 0.07 0.68 0.00 0.07 1.44 0.14

DC7 2.63 2.21 60.28 4.80 4.32 19.34 0.28 0.66 5.42 0.07

DC8 1.49 5.60 76.87 7.20 0.96 0.36 0.16 0.13 7.01 0.23

PD1 17.72 1.15 30.13 28.84 6.17 1.58 0.50 0.93 12.12 0.86

PD3 0.59 1.51 86.69 2.43 1.44 3.21 0.33 1.77 1.97 0.07

PD4 16.65 0.35 62.60 2.01 2.27 1.31 0.35 1.05 13.34 0.09

PD7 24.59 1.66 20.99 12.15 20.03 6.35 0.28 1.66 11.46 0.83

SL 3.06 0.72 4.69 46.84 28.40 0.98 1.63 0.07 13.55 0.07

DX1 4.00 7.21 35.19 10.28 13.28 2.43 2.93 7.49 16.35 0.86

DX2 21.78 0.34 1.24 3.84 11.51 17.72 19.19 0.34 20.43 3.61

DX3 11.68 3.09 0.20 23.65 22.65 16.27 7.09 0.30 14.37 0.70

YP1 0.47 6.72 54.10 25.69 1.28 1.34 2.82 4.34 3.19 0.03

YP2 6.61 2.15 17.12 39.19 27.97 1.78 0.71 0.45 3.57 0.45

YP3 1.76 4.11 19.31 3.44 68.50 2.35 0.08 0.04 0.38 0.04

YP4 4.09 4.04 11.96 18.86 46.92 6.70 0.79 2.51 3.79 0.34

YP5 8.08 1.66 21.62 21.78 29.91 2.78 1.44 1.07 10.43 1.23

All phylotypes were classified at the phylum level (subphylum for the Proteobacteria).

Others include 12 phyla: Bacteroidetes, Chlamydiae, Chloroflexi, Crenarchaeota, Cyanobacteria, Deinococcus-Thermus, Gemmatimonadetes, OD1, OP11,

Planctomycetes, TM7, Verrucomicrobia; and two subphyla for Proteobacteria: Deltaproteobacteria and Epsilonproteobacteria.

Page 16: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S4. Summary of statistics (R2) from dissimilarity test (Adonis) between two mining areas on

the functional community structure.

AHTL JXDX JXYP FJZJ GDYF GXDC GZPD

Longitude (E) 118 117 117 116 112 108 106

Latitude (N) 31 29 28 25 23 25 26

Distancea 0.01-6.85 0.04-0.60 0.01-1.83 1.09-2.30 0.02-2.38 0.01-4.46 0.45-21.08

No. of samples 6 3 5 4 7 6 4

AHTL

0.155 0.079 0.119 0.186* 0.089 0.080

JXDX

0.195 0.281 0.222 0.105 0.281*

JXYP

0.117 0.159 0.095 0.114

FJZJ

0.206 0.187 0.119

GDYF

0.233* 0.242*

GXDC 0.108

Samples in a mining area were clustered into a group and compared with others based on Bray-Curtis

dissimilarity of the log-transferred signal intensity of the GeoChip data using Adonis (*, P < 0.05). Mining

areas with less than 3 samples were excluded from this analysis (i.e., totally 5 samples in 4 mining areas

were excluded). a The range of distance (km) between two samples within the mining area.

Page 17: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S5. Environmental and taxonomic variable loadings on the PCs across the AMD samples.

Environmental properties

(Abbr.)

PCEnv1

(E1)

PCEnv2

(E2) Microbial taxa

(Abbr.)

PCTaxa1

(T1)

PCTaxa2

(T2)

PCTaxa3

(T3)

PCTaxa4

(T4)

exp.* 0.522 0.203 exp. 0.280 0.169 0.141 0.118

pH

0.636 -0.001

Euryarchaeota (Eury)

-1.050 0.251 -0.691 0.062

Dissolved Oxygen (DO)

0.103 0.148

Acidobacteria (Acido)

-0.088 -0.769 0.368 -0.647

Total Organic Carbon (TOC)

-0.397 -0.175

Actinobacteria (Actino)

-0.842 -0.812 0.019 0.081

Electrical Conductivity (EC)

-0.503 -0.040

Firmicutes (Firm)

-0.808 -0.690 0.180 0.184

Sulfate (SO42-

)

-0.592 0.118

Nitrospira (Nitro)

-0.712 0.711 0.133 -0.420

Ferric ion (Fe3+

)

-1.519 0.387

Alphaproteobacteria (Alpha)

0.262 -0.074 -0.083 -1.214

Ferrous ion (Fe2+

)

-1.174 0.983

Betaproteobacteria (Beta)

1.151 -0.370 -0.620 0.298

Aluminum (Al)

-1.047 -0.152

Gammaproteobacteria (Gamma) -0.222 0.708 0.980 0.230

Copper (Cu)

-0.669 -1.246

Zinc (Zn)

-1.032 -0.342

Arsenic (As)

-0.615 -0.348

Cadmium (Cd)

-0.194 -0.205

Lead (Pb)

-0.376 -0.278

Phosphorus (P) -0.100 -0.024

Variables in bold show the dominant influence (top-50%) on each PC.

* Proportion explained.

Page 18: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S6. Multiple linear regression (MLR) of environmental variables and relative abundance of dominant microbial lineages on metabolic potential of functional genes.

Category Subcategory Gene PCsa AIC Best modelb Environmental properties Microbial taxa

pH Fe3+ Fe2+ Al Cu Zn Eury Acido Nitro Alpha Beta Gamma

Nitrogen cycling Denitrification narG E1 112.51 pH + Al + Cu -0.470

-0.452

Nitrogen cycling Denitrification nirK E1 112.06 Cu 0.386

Nitrogen cycling Assimilatory N reduction nirB E1 112.46 Fe2+ + Fe3+

0.666 -0.571

Sulfur cycling Sulfite reductase dsrB E1 107.26 pH + Cu -0.589

Energy process Electron transport Fe-S cluster binding protein E1 114.04 pH -0.325

Energy process Electron transport ferredoxin E1 114.04 Fe2+ + pH -0.365 0.447

Energy process Electron transport NADH ubiquinone oxidoreductase E1 113.82 pH -0.331

Energy process Electron transport terminal quinol oxidase E1 114.34 pH -0.314

Energy process Hydrogenase hydrogenase E1 113.57 pH -0.340

Metal resistance As arsB E1 113.17 pH -0.353

Metal resistance As arsM E1 113.80 pH -0.333

Metal resistance Cd cadA E1 112.78 pH -0.365

Metal resistance Cr chrA E1 112.66 pH -0.369

Metal resistance Cu copA E1 114.03 pH + Al -0.529

Metal resistance Te terC E1 114.33 Cu

0.315

Metal resistance Te terD E1 110.58 Zn + Fe2++ pH + Al -0.553 -0.368 0.326

Stress response Heat dnaK E1 113.66 Cu

0.338

Stress response Nitrogen limitation glnA E1 113.37 pH + Fe2+ + Zn -0.398

Stress response Oxygen stress fnr E1 107.60 pH -0.488

Stress response Oxygen stress oxyR E1 114.20 pH -0.320

Stress response Protein stress clpC E1 112.74 Al + pH 0.415

0.577

Antibiotic resistance Transporter MatE antibiotics E1 115.38 pH + Zn -0.352

Antibiotic resistance Transporter SMR antibiotics E1 113.91 pH -0.329

Nitrogen cycling Denitrification norB T1 112.60 Eury + Firm 0.316

Nitrogen cycling Dissimilatory N reduction nrfA T1 113.90 Eury

0.330

Stress response Heat hrcA T1 114.24 Eury -0.318

Stress response Nitrogen limitation glnR T1 111.51 Eury

0.401

Stress response Phosphate limitation pstC T1 102.82 Eury 0.570

Antibiotic resistance Transporter ABC antibiotic transporter T1 113.35 Beta

-0.348

Nitrogen cycling Assimilatory N reduction nirA T3 113.81 Gamma -0.333

Phosphorus Phosphorus utilization ppk T3 112.60 Gamma + Eury

-0.303

Sulfur cycling Sulfur oxidation sox T3 113.18 Gamma 0.353

Metal resistance Ag silP T3 109.03 Eury + Beta

0.552

0.326

Metal resistance Cd cadBD T3 111.11 Gamma + Beta 0.366 0.530

Metal resistance Ni nreB T3 111.07 Gamma

-0.412

Stress response Heat groES T3 106.08 Gamma + Acido + Eury -0.309 -0.405

Stress response Glucose limitation bglH T3 113.75 Gamma + Beta

0.464

Antibiotic resistance Transporter Mex T3 112.56 Gamma + Eury 0.318

Nitrogen cycling Ammonification gdh T4 112.59 Nitro + Acido

0.297 0.333

Energy process Electron transport ferredoxin oxidoreductase T4 114.35 Nitro 0.314

Metal resistance Hg mer T4 113.13 Alpha

0.355

Stress response Oxygen limitation cydB T4 114.32 Alpha 0.315

Antibiotic resistance other category Van T4 111.90 Beta + Alpha -0.304 0.307

a The most important PCs to the metabolic potential of functional genes that determined by ABT model, and the variables with dominant influence based on PC loadings were selected as input in the multiple linear regression (MLR) models.

b The best model is based on the AIC value.

Only significant estimates (P < 0.05) for the best model with stepwise method were reported to show the most important environmental properties and dominant taxa to the metabolic potential of functional gene.

Page 19: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S7. Validation of predictive models for relative abundances of dominant microbial taxa (Phylum level, mean relative

abundance > 1%) based on the artificial neural network (ANN).

Taxa Occurrenceb

Bray-Curtis similaritya

Predicted vs Observed Null model (Mean)c Null model (Minimum)

d

Euryarchaeota 95.0% 0.685

0.489

0.001

Acidobacteria 87.5% 0.709

0.485

0.002

Actinobacteria 87.5% 0.694

0.369

0.001

Firmicutes 95.0% 0.725

0.497

0.001

Nitrospira 100.0% 0.757

0.489

0.011

Alphaproteobacteria 95.0% 0.657

0.518

0.001

Betaproteobacteria 97.5% 0.783

0.640

0.001

Gammaproteobacteria 100.0% 0.801 0.620 0.109

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative

microbial abundances.

b The occurrence shows the percentage of the total samples where the given taxa were detected.

c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.

d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.

Page 20: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S8. Validation of predictive models for relative abundances of dominant microbial taxa (Order level, mean relative abundance > 0.1%) based

on the artificial neural network (ANN).

Taxa Occurrenceb

Bray-Curtis similaritya

Predicted vs Observed Null model (Mean)c Null model (Minimum)d

Nitrospirales 100.0% 0.735

0.475

0.001

Acidithiobacillales 97.5% 0.606

0.464

0.002

Thermoplasmatales 95.0% 0.493

0.123

0.037

Rhodospirillales 95.0% 0.456

0.186

0.022

Ferrovales 90.0% 0.717

0.413

0.007

Acidobacteria_Gp1 85.0% 0.709

0.313

0.010

Bacillales 85.0% 0.536

0.101

0.010

Burkholderiales 85.0% 0.813

0.230

0.011

Clostridiales 82.5% 0.782

0.350

0.011

Xanthomonadales 82.5% 0.630

0.226

0.012

Legionellales 65.0% 0.546

0.398

0.002

Rhizobiales 60.0% 0.797

0.395

0.002

Acidimicrobiales 52.5% 0.737

0.218

0.029

Pseudomonadales 52.5% 0.376

0.489

0.011

Chlamydiales 47.5% 0.739

0.199

0.036

Actinomycetales 45.0% 0.663

0.136

0.032

Sphingomonadales 42.5% 0.828

0.275

0.005

Rhodocyclales 42.5% 0.746

0.081

0.014

Chloroplast 32.5% 0.513

0.436

0.001

Hydrogenophilales 30.0% 0.906

0.340

0.015

Desulfuromonadales 30.0% 0.735

0.343

0.001

Sphingobacteriales 27.5% 0.868

0.148

0.005

Gemmatimonadales 22.5% 0.485

0.113

0.019

Planctomycetales 22.5% 0.701

0.614

0.001

Caulobacterales 22.5% 0.159

0.262

0.007

Desulfobacterales 22.5% 0.488

0.169

0.008

Holophagales 20.0% 0.806

0.282

0.023

Campylobacterales 17.5% 0.974

0.095

0.007

Opitutales 17.5% 0.671

0.492

0.001

Bacteroidales 15.0% 0.800

0.072

0.050

Enterobacteriales 15.0% 0.870

0.118

0.018

Acidobacteria_Gp16 12.5% 0.857

0.408

0.005

Neisseriales 12.5% 0.452

0.341

0.009

Rhodobacterales 7.5% 0.905

0.351

0.002

Aeromonadales 5.0% 0.975 0.174 0.039

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.

b The occurrence shows the percentage of the total samples where the given taxa were detected.

c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.

d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.

Page 21: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S9. Validation of predictive models for relative abundances of key microbial taxa (OTU level, observed in at least half of the

total samples) based on the artificial neural network (ANN).

Taxa Occurrenceb

Bray-Curtis similaritya

Predicted vs Observed Null model (Mean)c Null model (Minimum)

d

OTU2197 97.5% 0.688

0.493

0.001

OTU1 90.0% 0.569

0.436

0.001

OTU3 90.0% 0.526

0.436

0.001

OTU2196 80.0% 0.694

0.522

0.001

OTU5 77.5% 0.529

0.439

0.001

OTU0 75.0% 0.585

0.338

0.001

OTU10 57.5% 0.208

0.166

0.001

OTU12 57.5% 0.794

0.347

0.002

OTU4 55.0% 0.805

0.294

0.001

OTU2 52.5% 0.463

0.281

0.001

OTU11 52.5% 0.702

0.343

0.003

OTU17 52.5% 0.551

0.368

0.001

OTU21 52.5% 0.564

0.471

0.009

OTU26 52.5% 0.595 0.410 0.004

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial

abundances. b The occurrence shows the percentage of the total samples where the given taxa were detected.

c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.

d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.

Page 22: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

aclB aclB 0.940 0.299 0.141 0.930 0.345 0.076 0.950 0.299 0.141 0.930 0.345 0.076

CODH CODH 0.969 0.317 0.263 0.954 0.429 0.237 0.975 0.317 0.263 0.954 0.429 0.237

Pcc Pcc 0.974 0.294 0.193 0.953 0.357 0.119 0.970 0.294 0.193 0.953 0.357 0.119

RubisCo RubisCo 0.961 0.290 0.215 0.959 0.353 0.195 0.965 0.290 0.215 0.959 0.353 0.195

nifH nifH 0.931 0.297 0.239 0.949 0.347 0.206 0.965 0.297 0.239 0.958 0.347 0.206

gdh gdh 0.916 0.262 0.172 0.905 0.348 0.178 0.941 0.262 0.172 0.905 0.348 0.178

ureC ureC 0.924 0.275 0.222 0.949 0.394 0.157 0.956 0.275 0.222 0.949 0.394 0.157

amoA amoA 0.962 0.326 0.196 0.954 0.372 0.187 0.976 0.326 0.196 0.958 0.372 0.187

narG narG 0.971 0.312 0.274 0.961 0.362 0.210 0.982 0.312 0.274 0.961 0.362 0.210

nirK nirK 0.965 0.289 0.235 0.950 0.339 0.202 0.971 0.289 0.235 0.950 0.339 0.202

nirS nirS 0.946 0.285 0.181 0.941 0.335 0.107 0.940 0.285 0.181 0.941 0.335 0.107

norB norB 0.929 0.249 0.139 0.920 0.299 0.065 0.943 0.249 0.139 0.917 0.299 0.065

nosZ nosZ 0.930 0.282 0.151 0.948 0.332 0.077 0.934 0.282 0.151 0.948 0.332 0.077

nasA nasA 0.930 0.243 0.143 0.915 0.293 0.110 0.913 0.243 0.143 0.925 0.293 0.110

NiR NiR 0.918 0.283 0.198 0.954 0.333 0.165 0.946 0.283 0.198 0.956 0.333 0.165

nirA nirA 0.900 0.235 0.033 0.914 0.285 0.000 0.914 0.235 0.033 0.914 0.285 0.000

nirB nirB 0.866 0.200 0.016 0.857 0.250 0.017 0.876 0.200 0.016 0.857 0.250 0.017

napA napA 0.950 0.293 0.204 0.950 0.342 0.140 0.945 0.293 0.204 0.950 0.342 0.140

nrfA nrfA 0.929 0.257 0.159 0.919 0.320 0.085 0.931 0.257 0.159 0.919 0.320 0.085

Page 23: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

phytase phytase 0.797 0.255 0.133 0.918 0.318 0.131 0.868 0.255 0.133 0.950 0.318 0.131

ppk ppk 0.912 0.267 0.151 0.930 0.329 0.150 0.974 0.267 0.151 0.930 0.329 0.150

ppx ppx 0.961 0.287 0.277 0.954 0.350 0.275 0.965 0.287 0.277 0.954 0.350 0.275

aprA aprA 0.950 0.318 0.175 0.944 0.365 0.166 0.964 0.318 0.175 0.958 0.365 0.166

aprB aprB 0.914 0.283 0.142 0.912 0.329 0.133 0.916 0.283 0.142 0.912 0.329 0.133

dsrA dsrA 0.966 0.321 0.270 0.964 0.408 0.191 0.970 0.321 0.270 0.964 0.408 0.191

dsrB dsrB 0.953 0.308 0.205 0.955 0.395 0.125 0.954 0.308 0.205 0.955 0.395 0.125

sox sox 0.944 0.270 0.177 0.936 0.390 0.157 0.949 0.270 0.177 0.936 0.390 0.157

Fe-S cluster

binding protein fes 0.975 0.209 0.127 0.870 0.296 0.207 0.980 0.209 0.127 0.870 0.296 0.207

ferredoxin fer 0.847 0.173 0.014 0.902 0.260 0.094 0.907 0.173 0.014 0.902 0.260 0.094

ferredoxin

oxidoreductase fero 0.908 0.188 0.029 0.872 0.275 0.109 0.925 0.188 0.029 0.872 0.275 0.109

NADH ubiquinone

oxidoreductase NADH 0.973 0.173 0.055 0.889 0.223 0.119 0.989 0.173 0.055 0.889 0.223 0.119

terminal quinol

oxidase quio 0.980 0.257 0.020 0.943 0.377 0.085 0.987 0.257 0.020 0.943 0.377 0.085

cytochrome cyt 0.902 0.311 0.193 0.946 0.423 0.173 0.912 0.311 0.193 0.946 0.423 0.173

hydrogenase hyd 0.918 0.268 0.025 0.919 0.355 0.032 0.931 0.268 0.025 0.917 0.355 0.032

Ni-Fe hydrogenase NFhyd 0.884 0.213 0.124 0.863 0.263 0.127 0.904 0.213 0.124 0.879 0.263 0.127

glycosyl

transferase glyt 0.912 0.201 0.077 0.899 0.288 0.083 0.913 0.201 0.077 0.892 0.288 0.083

ABC transporter ABCt 0.974 0.272 0.003 0.918 0.319 0.006 0.974 0.272 0.003 0.938 0.319 0.006

Page 24: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

silA silA 0.762 0.161 0.119 0.855 0.281 0.121 0.829 0.161 0.119 0.866 0.281 0.121

silC silC 0.926 0.233 0.020 0.932 0.353 0.010 0.948 0.233 0.020 0.932 0.353 0.010

silP silP 0.928 0.229 0.140 0.940 0.349 0.120 0.943 0.229 0.140 0.940 0.349 0.120

al al 0.933 0.281 0.066 0.912 0.328 0.057 0.928 0.281 0.066 0.912 0.328 0.057

aoxB aoxB 0.949 0.317 0.184 0.951 0.364 0.175 0.972 0.317 0.184 0.951 0.364 0.175

arsA arsA 0.797 0.203 0.122 0.845 0.249 0.123 0.831 0.203 0.122 0.875 0.249 0.123

arsB arsB 0.921 0.241 0.075 0.899 0.287 0.066 0.956 0.241 0.075 0.928 0.287 0.066

arsC arsC 0.973 0.334 0.273 0.968 0.380 0.264 0.986 0.334 0.273 0.971 0.380 0.264

arsM arsM 0.976 0.190 0.041 0.901 0.236 0.032 0.977 0.190 0.041 0.901 0.236 0.032

cadA cadA 0.952 0.305 0.213 0.943 0.417 0.187 0.974 0.305 0.213 0.943 0.417 0.187

cadBD cadBD 0.849 0.235 0.072 0.885 0.347 0.047 0.937 0.235 0.072 0.885 0.347 0.047

czcA czcA 0.970 0.323 0.190 0.949 0.435 0.111 0.981 0.323 0.190 0.949 0.435 0.111

czcC czcC 0.858 0.193 0.080 0.849 0.305 0.160 0.866 0.193 0.080 0.849 0.305 0.160

czcD czcD 0.965 0.335 0.272 0.969 0.447 0.193 0.966 0.335 0.272 0.974 0.447 0.193

corC corC 0.885 0.263 0.029 0.903 0.375 0.109 0.884 0.263 0.029 0.903 0.375 0.109

chrA chrA 0.967 0.334 0.261 0.967 0.446 0.235 0.969 0.334 0.261 0.966 0.446 0.235

copA copA 0.964 0.312 0.265 0.946 0.424 0.185 0.985 0.312 0.265 0.949 0.424 0.185

cueO cueO 0.745 0.316 0.255 0.755 0.128 0.044 0.799 0.316 0.255 0.755 0.128 0.044

cusA cusA 0.894 0.232 0.120 0.879 0.344 0.284 0.914 0.232 0.120 0.879 0.344 0.284

Page 25: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

mer mer 0.925 0.283 0.241 0.940 0.370 0.177 0.932 0.283 0.241 0.940 0.370 0.177

merB merB 0.821 0.163 0.099 0.847 0.213 0.164 0.879 0.163 0.099 0.847 0.213 0.164

merP merP 0.886 0.192 0.093 0.904 0.242 0.029 0.907 0.192 0.093 0.904 0.242 0.029

nreB nreB 0.869 0.155 0.025 0.838 0.218 0.132 0.879 0.155 0.025 0.835 0.218 0.132

pbrA pbrA 0.918 0.233 0.180 0.918 0.296 0.254 0.918 0.233 0.180 0.925 0.296 0.254

pbrT pbrT 0.812 0.212 0.104 0.876 0.274 0.178 0.850 0.212 0.104 0.876 0.274 0.178

tehB tehB 0.796 0.265 0.249 0.972 0.385 0.229 0.870 0.265 0.249 0.972 0.385 0.229

terC terC 0.960 0.280 0.215 0.967 0.400 0.195 0.971 0.280 0.215 0.967 0.400 0.195

terD terD 0.963 0.275 0.216 0.953 0.395 0.152 0.985 0.275 0.216 0.951 0.395 0.152

terZ terZ 0.917 0.195 0.020 0.873 0.315 0.085 0.922 0.195 0.020 0.873 0.315 0.085

zitB zitB 0.914 0.218 0.048 0.921 0.337 0.017 0.961 0.218 0.048 0.921 0.337 0.017

zntA zntA 0.945 0.265 0.116 0.938 0.385 0.052 0.953 0.265 0.116 0.938 0.385 0.052

cspA cspA 0.891 0.260 0.151 0.904 0.372 0.231 0.917 0.260 0.151 0.904 0.372 0.231

cspB cspB 0.833 0.214 0.089 0.860 0.326 0.169 0.878 0.214 0.089 0.860 0.326 0.169

dnaK dnaK 0.940 0.293 0.194 0.946 0.380 0.114 0.947 0.293 0.194 0.946 0.380 0.114

groEL groEL 0.925 0.274 0.141 0.946 0.361 0.147 0.952 0.274 0.141 0.946 0.361 0.147

groES groES 0.927 0.256 0.050 0.911 0.343 0.043 0.962 0.256 0.050 0.929 0.343 0.043

grpE grpE 0.962 0.318 0.223 0.960 0.405 0.229 0.967 0.318 0.223 0.960 0.405 0.229

hrcA hrcA 0.956 0.320 0.249 0.963 0.407 0.255 0.964 0.320 0.249 0.964 0.407 0.255

Page 26: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

bglH bglH 0.893 0.262 0.013 0.895 0.308 0.038 0.882 0.262 0.013 0.895 0.308 0.038

bglP bglP 0.905 0.256 0.081 0.905 0.302 0.055 0.935 0.256 0.081 0.905 0.302 0.055

glnA glnA 0.976 0.328 0.309 0.974 0.415 0.316 0.982 0.328 0.309 0.974 0.415 0.316

glnR glnR 0.911 0.236 0.057 0.886 0.323 0.064 0.950 0.236 0.057 0.919 0.323 0.064

arcA arcA 0.935 0.174 0.022 0.933 0.220 0.013 0.950 0.174 0.022 0.933 0.220 0.013

arcB arcB 0.905 0.259 0.122 0.917 0.306 0.226 0.918 0.259 0.122 0.917 0.306 0.226

cydA cydA 0.905 0.277 0.112 0.919 0.389 0.032 0.933 0.277 0.112 0.919 0.389 0.032

cydB cydB 0.947 0.273 0.020 0.906 0.385 0.060 0.955 0.273 0.020 0.906 0.385 0.060

narH narH 0.886 0.200 0.015 0.909 0.250 0.018 0.890 0.200 0.015 0.927 0.250 0.018

narI narI 0.816 0.297 0.265 0.967 0.347 0.232 0.975 0.297 0.265 0.967 0.347 0.232

narJ narJ 0.904 0.244 0.065 0.895 0.294 0.032 0.911 0.244 0.065 0.895 0.294 0.032

ahpC ahpC 0.973 0.333 0.235 0.972 0.379 0.226 0.976 0.333 0.235 0.972 0.379 0.226

ahpF ahpF 0.948 0.292 0.219 0.948 0.339 0.210 0.963 0.292 0.219 0.949 0.339 0.210

fnr fnr 0.978 0.315 0.247 0.977 0.402 0.167 0.985 0.315 0.247 0.977 0.402 0.167

katA katA 0.918 0.272 0.158 0.922 0.359 0.164 0.927 0.272 0.158 0.924 0.359 0.164

katE katE 0.967 0.313 0.249 0.962 0.400 0.185 0.966 0.313 0.249 0.962 0.400 0.185

oxyR oxyR 0.969 0.280 0.221 0.966 0.343 0.147 0.975 0.280 0.221 0.968 0.343 0.147

perR perR 0.824 0.140 0.030 0.819 0.202 0.130 0.825 0.140 0.030 0.819 0.202 0.130

proV proV 0.937 0.293 0.226 0.950 0.356 0.224 0.942 0.293 0.226 0.950 0.356 0.224

proX proX 0.909 0.227 0.111 0.908 0.289 0.109 0.911 0.227 0.111 0.908 0.289 0.109

phoA phoA 0.956 0.274 0.165 0.955 0.337 0.163 0.980 0.274 0.165 0.955 0.337 0.163

Page 27: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

phoB phoB 0.953 0.286 0.227 0.962 0.349 0.225 0.959 0.286 0.227 0.962 0.349 0.225

pstA pstA 0.974 0.288 0.252 0.981 0.350 0.250 0.982 0.288 0.252 0.981 0.350 0.250

pstB pstB 0.976 0.306 0.244 0.964 0.369 0.242 0.981 0.306 0.244 0.964 0.369 0.242

pstC pstC 0.964 0.282 0.186 0.955 0.344 0.166 0.970 0.282 0.186 0.955 0.344 0.166

pstS pstS 0.894 0.243 0.139 0.932 0.306 0.119 0.896 0.243 0.139 0.932 0.306 0.119

clpC clpC 0.964 0.333 0.190 0.964 0.445 0.165 0.974 0.333 0.190 0.964 0.445 0.165

ctsR ctsR 0.901 0.277 0.087 0.923 0.389 0.007 0.912 0.277 0.087 0.923 0.389 0.007

obgE obgE 0.955 0.279 0.186 0.945 0.342 0.112 0.956 0.279 0.186 0.949 0.342 0.112

ABC antibiotic

transporter ABCat 0.919 0.288 0.060 0.924 0.334 0.051 0.921 0.288 0.060 0.924 0.334 0.051

MatE antibiotics MatE 0.948 0.296 0.131 0.946 0.383 0.067 0.986 0.296 0.131 0.948 0.383 0.067

MFS antibiotics MFS 0.966 0.317 0.234 0.966 0.367 0.170 0.970 0.317 0.234 0.966 0.367 0.170

SMR antibiotics SMR 0.974 0.295 0.261 0.974 0.415 0.241 0.986 0.295 0.261 0.974 0.415 0.241

Mex Mex 0.815 0.148 0.037 0.811 0.198 0.043 0.867 0.148 0.037 0.881 0.198 0.043

beta-lactamase lac 0.895 0.276 0.007 0.907 0.323 0.019 0.907 0.276 0.007 0.915 0.323 0.019

class A

beta-lactamase lacA 0.942 0.312 0.125 0.949 0.424 0.099 0.953 0.312 0.125 0.949 0.424 0.099

class C

beta-lactamase lacC 0.920 0.313 0.224 0.944 0.425 0.199 0.921 0.313 0.224 0.950 0.425 0.199

Tet Tet 0.942 0.265 0.091 0.937 0.385 0.027 0.951 0.265 0.091 0.942 0.385 0.027

Van Van 0.806 0.106 0.017 0.771 0.225 0.134 0.835 0.106 0.017 0.814 0.225 0.134

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.

b Models were constructed without the information of microbial abundances of dominant phyla.

c Models were constructed with the information of microbial abundances of dominant phyla.

d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.

e This null model is to set all predicted metabolic potentials equal to the average value across all samples.

f This null model is to set all metabolic potentials equal to the minimum observed value.

Page 28: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

aclB aclB 0.755 0.104 0.054 0.818 0.175 0.055 0.810 0.104 0.054 0.818 0.175 0.055

CODH CODH 0.869 0.213 0.056 0.818 0.112 0.055 0.869 0.213 0.056 0.818 0.112 0.055

Pcc Pcc 0.791 0.170 0.042 0.863 0.214 0.044 0.791 0.170 0.042 0.863 0.214 0.044

RubisCo RubisCo 0.865 0.206 0.040 0.858 0.180 0.040 0.876 0.206 0.040 0.858 0.180 0.040

nifH nifH 0.786 0.175 0.050 0.800 0.139 0.051 0.786 0.175 0.050 0.836 0.139 0.051

gdh gdh 0.744 0.078 0.062 0.753 0.191 0.062 0.773 0.078 0.062 0.776 0.191 0.062

ureC ureC 0.859 0.246 0.059 0.814 0.155 0.058 0.859 0.246 0.059 0.814 0.155 0.058

amoA amoA 0.792 0.108 0.036 0.831 0.208 0.037 0.792 0.108 0.036 0.852 0.208 0.037

narG narG 0.839 0.187 0.041 0.819 0.145 0.040 0.839 0.187 0.041 0.819 0.145 0.040

nirK nirK 0.891 0.207 0.052 0.844 0.113 0.051 0.891 0.207 0.052 0.844 0.113 0.051

nirS nirS 0.863 0.253 0.043 0.824 0.164 0.042 0.873 0.253 0.043 0.824 0.164 0.042

norB norB 0.850 0.189 0.043 0.794 0.105 0.042 0.871 0.189 0.043 0.843 0.105 0.042

nosZ nosZ 0.849 0.134 0.041 0.867 0.179 0.041 0.849 0.134 0.041 0.876 0.179 0.041

nasA nasA 0.839 0.170 0.068 0.776 0.089 0.067 0.834 0.170 0.068 0.817 0.089 0.067

NiR NiR 0.703 0.153 0.064 0.857 0.144 0.067 0.726 0.153 0.064 0.868 0.144 0.067

nirA nirA 0.800 0.174 0.048 0.836 0.147 0.049 0.800 0.174 0.048 0.836 0.147 0.049

nirB nirB 0.777 0.112 0.047 0.766 0.091 0.047 0.777 0.112 0.047 0.766 0.091 0.047

napA napA 0.794 0.165 0.042 0.876 0.165 0.043 0.794 0.165 0.042 0.876 0.165 0.043

nrfA nrfA 0.814 0.208 0.047 0.779 0.137 0.046 0.814 0.208 0.047 0.779 0.137 0.046

Page 29: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

phytase phytase 0.775 0.111 0.045 0.797 0.153 0.046 0.775 0.111 0.045 0.797 0.153 0.046

ppk ppk 0.842 0.201 0.046 0.826 0.167 0.046 0.842 0.201 0.046 0.826 0.167 0.046

ppx ppx 0.867 0.261 0.066 0.727 0.265 0.063 0.883 0.261 0.066 0.727 0.265 0.063

aprA aprA 0.858 0.203 0.036 0.839 0.201 0.036 0.858 0.203 0.036 0.875 0.201 0.036

aprB aprB 0.725 0.180 0.058 0.763 0.134 0.059 0.746 0.180 0.058 0.763 0.134 0.059

dsrA dsrA 0.857 0.235 0.063 0.823 0.167 0.062 0.857 0.235 0.063 0.823 0.167 0.062

dsrB dsrB 0.784 0.231 0.036 0.865 0.193 0.037 0.784 0.231 0.036 0.865 0.193 0.037

sox sox 0.873 0.284 0.043 0.820 0.183 0.042 0.884 0.284 0.043 0.836 0.183 0.042

Fe-S cluster

binding protein fes 0.821 0.251 0.043 0.786 0.154 0.042 0.847 0.251 0.043 0.786 0.154 0.042

ferredoxin fer 0.790 0.289 0.110 0.865 0.261 0.112 0.790 0.289 0.110 0.865 0.261 0.112

ferredoxin

oxidoreductase fero 0.805 0.182 0.064 0.757 0.186 0.063 0.805 0.182 0.064 0.757 0.186 0.063

NADH ubiquinone

oxidoreductase NADH 0.992 0.293 0.058 0.737 0.284 0.053 0.992 0.293 0.058 0.737 0.284 0.053

terminal quinol

oxidase quio 0.987 0.337 0.051 0.861 0.285 0.048 0.988 0.337 0.051 0.861 0.285 0.048

cytochrome cyt 0.815 0.131 0.042 0.846 0.194 0.043 0.815 0.131 0.042 0.846 0.194 0.043

hydrogenase hyd 0.849 0.330 0.073 0.725 0.242 0.071 0.849 0.330 0.073 0.686 0.242 0.071

Ni-Fe hydrogenase NFhyd 0.840 0.144 0.049 0.853 0.188 0.049 0.840 0.144 0.049 0.871 0.188 0.049

glycosyl transferase glyt 0.829 0.170 0.042 0.859 0.183 0.042 0.861 0.170 0.042 0.844 0.183 0.042

ABC transporter ABCt 0.827 0.143 0.039 0.855 0.200 0.040 0.827 0.143 0.039 0.855 0.200 0.040

Page 30: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

silA silA 0.753 0.180 0.047 0.798 0.142 0.048 0.807 0.180 0.047 0.823 0.142 0.048

silC silC 0.814 0.170 0.040 0.881 0.186 0.041 0.832 0.170 0.040 0.881 0.186 0.041

silP silP 0.732 0.112 0.044 0.879 0.106 0.046 0.808 0.112 0.044 0.879 0.106 0.046

al al 0.829 0.146 0.043 0.844 0.178 0.043 0.829 0.146 0.043 0.844 0.178 0.043

aoxB aoxB 0.877 0.221 0.042 0.862 0.191 0.042 0.877 0.221 0.042 0.862 0.191 0.042

arsA arsA 0.798 0.128 0.053 0.802 0.150 0.054 0.840 0.128 0.053 0.858 0.150 0.054

arsB arsB 0.868 0.211 0.063 0.794 0.092 0.062 0.868 0.211 0.063 0.823 0.092 0.062

arsC arsC 0.873 0.164 0.042 0.859 0.145 0.041 0.873 0.164 0.042 0.869 0.145 0.041

arsM arsM 0.839 0.212 0.080 0.843 0.220 0.080 0.839 0.212 0.080 0.843 0.220 0.080

cadA cadA 0.792 0.138 0.043 0.854 0.143 0.044 0.810 0.138 0.043 0.854 0.143 0.044

cadBD cadBD 0.779 0.178 0.075 0.758 0.195 0.074 0.822 0.178 0.075 0.758 0.195 0.074

czcA czcA 0.874 0.271 0.035 0.840 0.191 0.035 0.885 0.271 0.035 0.840 0.191 0.035

czcC czcC 0.828 0.220 0.059 0.769 0.101 0.057 0.828 0.220 0.059 0.769 0.101 0.057

czcD czcD 0.784 0.158 0.049 0.851 0.181 0.050 0.784 0.158 0.049 0.841 0.181 0.050

corC corC 0.864 0.200 0.044 0.868 0.192 0.044 0.882 0.200 0.044 0.868 0.192 0.044

chrA chrA 0.842 0.181 0.045 0.848 0.192 0.045 0.842 0.181 0.045 0.848 0.192 0.045

copA copA 0.760 0.262 0.053 0.807 0.285 0.054 0.809 0.262 0.053 0.785 0.285 0.054

cueO cueO 0.723 0.108 0.124 0.775 0.212 0.125 0.723 0.108 0.124 0.775 0.212 0.125

cusA cusA 0.848 0.156 0.039 0.840 0.190 0.039 0.843 0.156 0.039 0.885 0.190 0.039

Page 31: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

mer mer 0.803 0.147 0.067 0.758 0.156 0.066 0.803 0.147 0.067 0.758 0.156 0.066

merB merB 0.764 0.143 0.070 0.745 0.189 0.069 0.781 0.143 0.070 0.745 0.189 0.069

merP merP 0.803 0.203 0.080 0.813 0.223 0.080 0.803 0.203 0.080 0.813 0.223 0.080

nreB nreB 0.832 0.172 0.058 0.806 0.138 0.058 0.832 0.172 0.058 0.824 0.138 0.058

pbrA pbrA 0.888 0.196 0.039 0.901 0.224 0.039 0.888 0.196 0.039 0.902 0.224 0.039

pbrT pbrT 0.812 0.166 0.047 0.817 0.177 0.047 0.812 0.166 0.047 0.817 0.177 0.047

tehB tehB 0.890 0.280 0.054 0.888 0.275 0.054 0.890 0.280 0.054 0.888 0.275 0.054

terC terC 0.864 0.124 0.038 0.891 0.179 0.038 0.864 0.124 0.038 0.891 0.179 0.038

terD terD 0.791 0.188 0.044 0.855 0.161 0.045 0.820 0.188 0.044 0.829 0.161 0.045

terZ terZ 0.778 0.115 0.045 0.802 0.136 0.045 0.805 0.115 0.045 0.802 0.136 0.045

zitB zitB 0.872 0.171 0.051 0.859 0.145 0.051 0.872 0.171 0.051 0.859 0.145 0.051

zntA zntA 0.850 0.203 0.037 0.853 0.209 0.037 0.850 0.203 0.037 0.853 0.209 0.037

cspA cspA 0.867 0.220 0.033 0.869 0.218 0.033 0.874 0.220 0.033 0.869 0.218 0.033

cspB cspB 0.801 0.162 0.045 0.787 0.134 0.044 0.801 0.162 0.045 0.787 0.134 0.044

dnaK dnaK 0.839 0.148 0.042 0.844 0.157 0.042 0.839 0.148 0.042 0.844 0.157 0.042

groEL groEL 0.832 0.149 0.054 0.834 0.152 0.054 0.832 0.149 0.054 0.834 0.152 0.054

groES groES 0.886 0.275 0.035 0.859 0.203 0.034 0.907 0.275 0.035 0.862 0.203 0.034

grpE grpE 0.881 0.260 0.043 0.857 0.213 0.042 0.881 0.260 0.043 0.857 0.213 0.042

hrcA hrcA 0.824 0.140 0.040 0.848 0.193 0.040 0.838 0.140 0.040 0.867 0.193 0.040

Page 32: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

bglH bglH 0.782 0.158 0.040 0.853 0.174 0.041 0.808 0.158 0.040 0.853 0.174 0.041

bglP bglP 0.777 0.203 0.048 0.813 0.117 0.048 0.835 0.203 0.048 0.813 0.117 0.048

glnA glnA 0.846 0.173 0.042 0.813 0.108 0.041 0.846 0.173 0.042 0.813 0.108 0.041

glnR glnR 0.813 0.107 0.050 0.804 0.123 0.050 0.856 0.107 0.050 0.882 0.123 0.050

arcA arcA 0.890 0.326 0.075 0.879 0.304 0.075 0.890 0.326 0.075 0.879 0.304 0.075

arcB arcB 0.852 0.160 0.049 0.890 0.223 0.049 0.865 0.160 0.049 0.890 0.223 0.049

cydA cydA 0.755 0.124 0.064 0.836 0.158 0.065 0.784 0.124 0.064 0.836 0.158 0.065

cydB cydB 0.887 0.308 0.045 0.830 0.193 0.044 0.887 0.308 0.045 0.830 0.193 0.044

narH narH 0.799 0.305 0.058 0.852 0.293 0.059 0.821 0.305 0.058 0.866 0.293 0.059

narI narI 0.884 0.179 0.063 0.813 0.193 0.062 0.828 0.179 0.063 0.813 0.193 0.062

narJ narJ 0.819 0.193 0.043 0.796 0.147 0.042 0.819 0.193 0.043 0.796 0.147 0.042

ahpC ahpC 0.861 0.195 0.045 0.911 0.195 0.046 0.861 0.195 0.045 0.911 0.195 0.046

ahpF ahpF 0.883 0.200 0.076 0.819 0.275 0.075 0.882 0.200 0.076 0.822 0.275 0.075

fnr fnr 0.853 0.161 0.037 0.908 0.172 0.038 0.853 0.161 0.037 0.908 0.172 0.038

katA katA 0.814 0.152 0.064 0.796 0.135 0.064 0.814 0.152 0.064 0.814 0.135 0.064

katE katE 0.811 0.158 0.039 0.820 0.162 0.039 0.826 0.158 0.039 0.820 0.162 0.039

oxyR oxyR 0.842 0.169 0.041 0.886 0.139 0.041 0.858 0.169 0.041 0.885 0.139 0.041

perR perR 0.825 0.190 0.058 0.795 0.131 0.057 0.825 0.190 0.058 0.795 0.131 0.057

proV proV 0.795 0.110 0.048 0.831 0.180 0.049 0.795 0.110 0.048 0.831 0.180 0.049

proX proX 0.833 0.192 0.047 0.788 0.102 0.046 0.833 0.192 0.047 0.788 0.102 0.046

phoA phoA 0.881 0.181 0.045 0.879 0.177 0.045 0.881 0.181 0.045 0.879 0.177 0.045

Page 33: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

c

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

vs

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

vs

Observed

Null model

(Mean)

Null model

(Minimum)

phoB phoB 0.838 0.196 0.043 0.867 0.153 0.043 0.838 0.196 0.043 0.867 0.153 0.043

pstA pstA 0.892 0.203 0.049 0.909 0.122 0.049 0.906 0.203 0.049 0.909 0.122 0.049

pstB pstB 0.839 0.177 0.036 0.854 0.208 0.037 0.839 0.177 0.036 0.854 0.208 0.037

pstC pstC 0.872 0.176 0.041 0.874 0.181 0.041 0.872 0.176 0.041 0.874 0.181 0.041

pstS pstS 0.817 0.113 0.056 0.820 0.119 0.056 0.817 0.113 0.056 0.820 0.119 0.056

clpC clpC 0.905 0.269 0.032 0.897 0.252 0.031 0.905 0.269 0.032 0.897 0.252 0.031

ctsR ctsR 0.788 0.199 0.055 0.824 0.171 0.056 0.788 0.199 0.055 0.824 0.171 0.056

obgE obgE 0.840 0.168 0.042 0.843 0.174 0.042 0.842 0.168 0.042 0.844 0.174 0.042

ABC antibiotic

transporter ABCat 0.828 0.168 0.046 0.847 0.194 0.046 0.840 0.168 0.046 0.847 0.194 0.046

MatE antibiotics MatE 0.866 0.239 0.048 0.866 0.212 0.048 0.916 0.239 0.048 0.887 0.212 0.048

MFS antibiotics MFS 0.873 0.216 0.044 0.876 0.221 0.044 0.873 0.216 0.044 0.876 0.221 0.044

SMR antibiotics SMR 0.808 0.246 0.036 0.878 0.181 0.038 0.814 0.246 0.036 0.878 0.181 0.038

Mex Mex 0.872 0.205 0.071 0.809 0.144 0.070 0.865 0.205 0.071 0.867 0.144 0.070

beta-lactamase lac 0.835 0.196 0.035 0.833 0.192 0.035 0.835 0.196 0.035 0.833 0.192 0.035

class A

beta-lactamase lacA 0.816 0.161 0.033 0.882 0.228 0.034 0.832 0.161 0.033 0.833 0.228 0.034

class C

beta-lactamase lacC 0.828 0.230 0.055 0.796 0.156 0.054 0.838 0.230 0.055 0.796 0.156 0.054

Tet Tet 0.795 0.176 0.043 0.865 0.221 0.045 0.795 0.176 0.043 0.871 0.221 0.045

Van Van 0.772 0.161 0.120 0.737 0.173 0.119 0.809 0.161 0.120 0.756 0.173 0.119

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.

b Models were constructed without the information of microbial abundances of dominant phyla.

c Models were constructed with the information of microbial abundances of dominant phyla.

d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.

e This null model is to set all predicted metabolic potentials equal to the average value across all samples.

f This null model is to set all metabolic potentials equal to the minimum observed value.

Page 34: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S11. Predictive equations and functional parameters that provide the best prediction for relative abundances of dominant microbial taxa based on the artificial neural network (ANN).

Taxa Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

Euryarchaeota Eury pH, EC, Fe2+ 95.9*EC*Fe2+ + 26.7*pH*pH*Fe2+ - 134*pH*Fe2+ - 13.6*EC*EC*Fe2+ 0.712 < 0.001

0.598 0.027

0.685 < 0.001

Acidobacteria Acido Fe3+, Gamma 2.44/(Fe3+*Fe3+ - 3.27) -1.39/(55.2 - 2*Fe3+*Gamma) 0.728 < 0.001

0.650 0.021

0.709 < 0.001

Actinobacteria Actino Eury, Fe3+ - 68/(Eury*Eury - 478) - 0.816/(1010*Eury*Eury - 476*Eury) 0.784 < 0.001

0.660 0.047

0.694 < 0.001

Firmicutes Firm Actino, Nitro 1.08*Actino + 0.046*Nitro + -9.87*Nitro/(51.7*Actino + Nitro*Nitro - 50.1*Nitro) 0.746 < 0.001

0.649 0.014

0.725 < 0.001

Nitrospira Nitro pH, Cu, Zn Cu -5.67/(Cu - 2.06) + 61*Zn/pH - 23*Zn 0.778 < 0.001

0.692 0.008

0.757 < 0.001

Alphaproteobacteria Alpha pH, TOC, Beta pH + 2.03/(3.93 - Beta) - 0.011/(1.49*TOC - 0.556) 0.657 < 0.001

0.650 0.023

0.657 < 0.001

Betaproteobacteria Beta pH, Fe2+, Eury 29.5*Eury + 13.4*pH + 32.7*pH*Fe2+ + 4.88*Eury*pH*pH - 71.3*Fe2+ - 24.9*pH*Eury 0.807 < 0.001

0.701 0.007

0.783 < 0.001

Gammaproteobacteria Gamma Fe2+, Beta 21.6 + 3.02/(Beta - 1.4) - 10.2/(Beta - 17.6) - 0.219*Beta 0.865 < 0.001 0.766 < 0.001 0.801 < 0.001

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with

observed values divided by total number of resamples.

Page 35: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

aclB aclB Al, Acido 232961 + 14107*Al + 1130*Al*Acido*Acido*Acido -

186*Acido*Acido*Acido*Acido - 8207*Acido*Al*Al 0.950 < 0.001 0.920 0.045 0.931 < 0.001

CODH CODH pH, DO, TOC 508125 + 191961*pH*pH/(87.8 + 1.92*pH*pH - 9.03*TOC - 36.2*DO*DO) 0.975 < 0.001

0.965 0.042

0.969 < 0.001

Pcc Pcc pH, TOC 1176937 + (241871 - 100000*pH)/(1.89*TOC + pH*TOC - pH) 0.970 < 0.001 0.950 0.031 0.964 < 0.001

RubisCo RubisCo DO, Alpha 643499 + 18027*Alpha + 12.9*DO*Alpha*Alpha*Alpha - 1764*Alpha*Alpha 0.965 < 0.001

0.945 0.024

0.956 < 0.001

nifH nifH DO 3066210 + 65545*DO + 1684063/DO - 410346/DO*DO 0.965 < 0.001 0.919 0.045 0.931 < 0.001

gdh gdh Fe3+, Nitro 200000*Fe3+ + 1182*Nitro + 1242/(2.23 - Nitro) - 35179*Fe3+*Fe3+ 0.941 < 0.001

0.934 0.026

0.936 < 0.001

ureC ureC Firm, Beta 1284444 + 2890/(0.013 - Firm) + (28972159 - 346594*Beta)/(Beta - 84.6) 0.956 < 0.001 0.932 0.066 0.945 < 0.001

amoA amoA pH 1031769 + 1000000*pH - 45446/(3.80 - pH) - 199433*pH*pH 0.976 0.010

0.951 0.012

0.962 < 0.001

narG narG nirK 428136 + 2.04*nirK + (547048 - 0.912*nirK)/(0.007*nirK - 5.59) - 0.006*nirK*nirK 0.982 < 0.001 0.960 0.038 0.971 < 0.001

nirK nirK Cu, Acido, Gamma 671138 + 203679*Acido + 164056*Cu + 0.964*Gamma*Gamma*Gamma -

3641*Cu*Gamma - 76175*Acido*Acido 0.971 < 0.001

0.926 0.002

0.938 < 0.001

nirS nirS Eury, Gamma, narG 705537 + 0.038*Eury*narG + 0.0002*narG*Gamma*Gamma - 2362*Gamma -

38255*Eury - 181*Gamma*Gamma - 760*Eury*Eury 0.940 < 0.001 0.935 0.020 0.939 < 0.001

norB norB Fe2+, Cd, Eury 159990 + 172363*Cd + 16726*Fe2+ + 1462*Eury + 100439*Fe2+*Fe2+*Cd -

357357*Fe2+*Cd 0.943 < 0.001

0.929 0.015

0.932 < 0.001

nosZ nosZ pH 4299268*pH + 49942917/pH - 39083676/(pH*pH) - 22315156 - 288282*pH*pH 0.934 < 0.001 0.917 0.042 0.930 < 0.001

nasA nasA Fe2+, Acido, NiR 518458 + 61193*Acido + 0.000006*NiR*NiR - 2.46*NiR - 61193*Fe2+*Acido*Acido 0.913 < 0.001

0.912 0.008

0.912 < 0.001

NiR NiR Acido 224601 + 137056*Acido + 26674*Acido*Acido*Acido -

1765*Acido*Acido*Acido*Acido - 113093*Acido*Acido 0.946 0.003 0.919 0.005 0.925 < 0.001

nirA nirA pH, nirB 60902 + 0.304*nirB - 2733*pH/(0.00001*nirB - pH) 0.914 0.002

0.895 0.033

0.900 < 0.001

nirB nirB pH 250137*pH + 86173*pH*pH + 10915*pH*pH*pH*pH - 276426 - 75289*pH*pH*pH 0.876 0.002 0.863 0.033 0.866 < 0.001

napA napA Pb, Acido 453943 - 152*Acido/(0.013 - 0.341*Pb) 0.945 0.003

0.944 0.034

0.944 < 0.001

nrfA nrfA pH 585744*pH - 114373*pH*pH 0.931 0.009 0.929 0.026 0.929 < 0.001

Page 36: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

phytase phytase EC, As 9590644 + 70474754*EC*As + 709869*EC*EC + 53127*As*As - 5191343*EC -

132295841*As - 9389329*As*EC*EC 0.868 0.015

0.775 0.020

0.797 0.011

ppk ppk DO 844411 + 137339/(DO*DO) + (51307*DO - 43871)/(0.968*DO*DO*DO -

DO*DO*DO*DO) 0.974 0.005 0.936 0.011 0.912 < 0.001

ppx ppx pH 1071301 + (35959*pH*pH - 11965*pH*pH*pH)/(pH - 3.02) 0.965 < 0.001

0.948 0.045

0.961 < 0.001

aprA aprA EC, Acido 96020459*EC + 425725*EC*Acido + 2284143*EC*EC*EC +

15880*Acido*Acido*Acido - 118817246 - 1574124*Acido - 25690773*EC*EC 0.964 < 0.001 0.958 0.003 0.959 < 0.001

aprB aprB Fe2+ 170011 - 3967/(8.93 - 3.44*Fe2+) - 13422*Fe2+ 0.916 0.003

0.907 0.014

0.914 0.002

dsrA dsrA DO, Al, P 1854633 + 39512/(0.684 + 0.035*DO*DO*DO - DO) - 12937*DO 0.970 < 0.001 0.954 0.009 0.966 < 0.001

dsrB dsrB EC, As 742540*EC + (4891561*EC + 543507*EC*EC*EC - 3668671 -

45292*EC*EC*EC*EC - 2445781*EC*EC)/As - 1227621 - 86116*As 0.954 0.010

0.951 0.013

0.953 0.004

sox sox Pb, Gamma 586406 + 100000*Pb + 1473*Gamma + 4391/(Gamma - 2.98) 0.949 < 0.001 0.946 0.011 0.948 < 0.001

Fe-S cluster binding

protein fes Actino

5970955 + 31584*Actino + 20213/Actino - 19.8/(Actino*Actino) + 2803/(1.94*Actino

- 0.134) 0.980 < 0.001

0.974 0.026

0.979 < 0.001

ferredoxin fer Fe2+ 51937 + 98.3/(0.005 + 0.701*Fe2+*Fe2+ - 0.167*Fe2+ - 0.539*Fe2+*Fe2+*Fe2+) 0.907 < 0.001 0.828 0.014 0.847 < 0.001

ferredoxin

oxidoreductase fero Actino, fes

4293408 + 1368359*Actino + 1.01e-6*fes*fes + 6.74e-21*fes*fes*fes*fes +

0.169*fes*Actino*Actino + 5.47e-8*Actino*fes*fes - 3.37*fes - 0.547*Actino*fes -

1.35e-13*fes*fes*fes - 421588*Actino*Actino - 1.69e-8*Actino*Actino*fes*fes

0.925 < 0.001

0.901 0.012

0.918 < 0.001

NADH ubiquinone

oxidoreductase NADH Actino, quio, fes 12816 + 2.04*fes 0.989 < 0.001 0.979 < 0.001 0.980 < 0.001

terminal quinol

oxidase quio Actino, fes 23382*Actino + 3.75*fes - 488737 0.987 < 0.001

0.978 < 0.001

0.980 < 0.001

cytochrome cyt Fe3+, P 1834354*Fe3+ + 1000000*Fe3+*P - 62.1/(P*P) - 1434077 - 2956422*P -

336122*Fe3+*Fe3+ 0.912 0.008 0.908 0.009 0.902 0.016

hydrogenase hyd pH 1137811 - 1117019/pH - 169985*pH 0.931 0.006

0.913 0.042

0.918 0.024

Ni-Fe hydrogenase NFhyd pH 26001 + 35284*pH + 1250/(3.463 - pH) - 3057/(3.1 - pH) - 1167*pH*pH*pH 0.904 0.002 0.878 0.003 0.884 < 0.001

glycosyl transferase glyt Actino 167223 + 11945/(151*Actino - 4.61) + 79.4/(0.614*Actino - 0.043) 0.913 0.010

0.885 0.018

0.892 < 0.001

ABC transporter ABCt pH, DO 50289301 + 5156265/(5.43*pH - pH*DO) 0.974 0.011 0.973 0.012 0.974 0.004

Page 37: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

silA silA Gamma 42797 + 6873/(Gamma - 4.32) + 3619/(0.062*Gamma - 4.35) 0.829 0.006

0.811 0.041

0.816 < 0.001

silC silC Firm 403314+ 970*Firm + -49358/(Firm - 11) + -2067/(10.2*Firm*Firm - 9.96*Firm) 0.948 0.001 0.933 0.053 0.937 < 0.001

silP silP Acido, Gamma 84496 + 123593*Acido + 23245*Acido*Acido*Acido -

48225*Acido*Acido*Acido*Acido - 328429*Acido*Acido 0.943 < 0.001

0.923 0.009

0.928 < 0.001

al al Actino 336538 + 50386*Actino/(0.21 - 5.92*Actino) 0.928 0.010 0.923 0.021 0.924 0.007

aoxB aoxB DO 275300 + 8228/DO - 1572*DO 0.972 0.028

0.941 0.037

0.949 0.005

arsA arsA SO42-, Acido

26815 + 10000*Acido + 32.7/(Acido - 0.451*Acido*Acido*Acido -

1.21*Acido*Acido) - 7268*Acido*Acido 0.831 < 0.001 0.807 0.001 0.823 < 0.001

arsB arsB P, arsM 84471 + (0.014*arsM - 579840 - 9.06e-11*arsM*arsM)/(P - 1.12) 0.956 < 0.001

0.909 0.002

0.921 < 0.001

arsC arsC pH, Firm 1190543 + 155*Firm*Firm - 35325/(Firm - 5.37) + (1195 - 393*pH)/Firm - 42731*pH 0.986 0.001 0.973 0.002 0.977 < 0.001

arsM arsM pH, DO 70000000 + (17476356 + 20185331*DO - 7970875*pH)/(DO - 0.162) 0.977 0.003

0.970 0.072

0.976 < 0.001

cadA cadA pH, Nitro 1353931 + 71.8*Nitro*Nitro + 15345/(5.54*Nitro - 4.74 - pH) - 100000*pH 0.974 < 0.001 0.949 0.003 0.956 < 0.001

cadBD cadBD Gamma 113340 + 189*Gamma*Gamma + 677/(0.282 - 0.002*Gamma*Gamma) -

4031*Gamma - 1.69*Gamma*Gamma*Gamma 0.937 < 0.001

0.889 0.003

0.901 < 0.001

czcA czcA Cd, Eury 1350240 + 1491*Eury + (3.11 - 415972*Cd)/(Eury - 12.6*Cd) - 572026*Cd 0.981 0.047 0.936 0.023 0.948 0.010

czcC czcC pH, Zn, Cd (2321245*Zn - 3460027)/(24.2*Zn - 35.3) 0.866 0.002

0.833 0.021

0.858 0.001

czcD czcD pH, Zn, Cd 2742379 - 298027/Zn + 1937665*Zn*Zn - 3086079*Zn - 100000*pH*Zn -

367520*Zn*Zn*Zn 0.966 < 0.001 0.965 0.028 0.965 < 0.001

corC corC DO, EC, Alpha 284882 + 14513*DO*DO + 7243*DO*EC*Alpha - 102100*DO - 21730*DO*Alpha -

36.9*Alpha*Alpha*Alpha 0.884 0.002

0.869 0.038

0.872 < 0.001

chrA chrA pH, Pb 1949385 + 949385*pH*Pb - 203945*pH - 2609204*Pb 0.969 < 0.001 0.963 0.026 0.967 < 0.001

copA copA Fe3+, Beta, cueO 307566 + 1429111*Fe3+ + 23.6*cueO + 3.2e-8*cueO*cueO*cueO - 0.002*cueO*cueO -

228820*Fe3+*Fe3+ 0.985 < 0.001

0.972 0.001

0.975 < 0.001

cueO cueO Al, cusA 20182 + 53375/(9.85 - 2e-8*cusA*cusA) 0.799 0.002 0.632 0.071 0.741 0.002

cusA cusA Cu, Acido 42529+ 962/(4.53 - 72.5*Acido) 0.914 0.006 0.872 0.019 0.882 0.002

Page 38: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

mer mer As, P 816722 + 190355*P + 27.4/(0.916*P - 0.036) - 2881875*As*P 0.932 0.006 0.923 0.068 0.925 0.001

merB merB TOC, Eury (2695077*Eury - 32425627)/(71.1*Eury - 837) 0.879 0.007

0.834 0.018

0.847 0.003

merP merP As, mer 113795 + 0.121*As*mer + 2.03e-7*mer*mer + 4.85e-13*As*mer*mer*mer - 0.203*mer

- 15146*As - 2.42e-19*As*mer*mer*mer*mer - 3.63e-7*As*mer*mer 0.907 < 0.001 0.878 0.030 0.886 < 0.001

nreB nreB DO, Al, Pb (153719 - 4213612*Pb)/(1.78 - 45.9*Pb) 0.879 < 0.001

0.865 0.021

0.869 0.000

pbrA pbrA Eury, Actino,Pb 36297 + 61190*Pb + 6318*Actino + 17.2*Eury*Eury - 2131*Eury*Actino -

359701*Pb*Pb*Pb 0.918 < 0.001 0.913 0.004 0.914 < 0.001

pbrT pbrT SO42-

4060888 + 2626557*SO42-*SO4

2- + 48640*SO42-*SO4

2-*SO42-*SO4

2- + 198/(1.82*SO42-

- 6.66) - 5274654*SO42- - 583679*SO4

2-*SO42-*SO4

2- 0.850 0.004

0.801 0.012

0.812 0.004

tehB tehB As, P, terD 456011 + P*terD + (0.182*As*terD + 100000*As*As - 109482*As)/P - 600000*P 0.870 < 0.001 0.774 0.031 0.796 < 0.001

terC terC DO, S, Fe3+ 740223 + (307529*DO*S - 922586*DO)/(7.44*DO - 1.47 - DO*DO) 0.971 0.003

0.957 0.006

0.960 < 0.001

terD terD Al, Cu, Acido 961472 - 74.5/Acido + 226398*Al*Cu + 298408*Cu*Cu - 86290*Al - 594027*Cu -

100000*Al*Cu*Cu 0.985 < 0.001 0.960 < 0.001 0.966 < 0.001

terZ terZ pH, Eury 182167*pH + 23708*Eury - 144660 - 8700*pH*Eury - 6455*pH*pH*pH 0.922 0.002

0.917 0.015

0.918 < 0.001

zitB zitB Actino, Firm, zntA 33082 + 7649*Actino*Actino + 1.01e-13*zntA*zntA*zntA - 0.032*Actino*zntA 0.961 < 0.001 0.913 < 0.001 0.925 < 0.001

zntA zntA Zn, Firm 697816 + 21491*Zn - 3427*Zn/(Firm - 0.046) 0.953 0.003

0.938 0.008

0.949 0.001

cspA cspA Fe3+, Gamma 13182 + 10000*Fe3+ + (10000*Fe3+ - 36835)/(Gamma - 4.05) 0.917 < 0.001 0.875 0.037 0.906 < 0.001

cspB cspB pH 26739 - 462/(2.08 - pH) 0.878 < 0.001

0.818 0.050

0.833 < 0.001

dnaK dnaK DO, EC 521331 - 130258/(20*DO - 22.6) 0.947 < 0.001 0.918 0.043 0.940 < 0.001

groEL groEL groES 3603648 - 34228689570/groES + 0.001*groES*groES - 96.4*groES - 4.64e-

9*groES*groES*groES 0.952 0.003

0.916 0.019

0.925 0.001

groES groES As, Cd, Acido 96478 - 715/As - 10937*Acido - 23920*As 0.962 0.002 0.907 0.003 0.922 < 0.001

grpE grpE As, groEL 53703153 - 10454992600000/groEL + 8.91e-5*groEL*groEL + 7.5e17/(groEL*groEL) -

114*groEL 0.967 < 0.001

0.949 0.023

0.962 < 0.001

hrcA hrcA pH, Acido 5386545*pH + 204351*pH*pH*pH - 3861904 - 68858*Acido - 1860173*pH*pH 0.964 0.018 0.933 0.014 0.941 0.029

Page 39: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

bglH bglH P, Beta 35250+ 250*P*P*Beta*Beta - 403*P*P*P*Beta*Beta 0.882 0.001

0.880 0.040

0.882 0.003

bglP bglP Acido 74057 + 87020*Acido + 156039*Acido*Acido*Acido -

31786*Acido*Acido*Acido*Acido - 222839*Acido*Acido 0.935 < 0.001 0.906 0.016 0.928 < 0.001

glnA glnA pH 2421028 + 463/(pH - 3.52) - 28547*pH*pH 0.982 0.002

0.975 0.007

0.976 < 0.001

glnR glnR TOC, Eury, Acido 197839 + 35819/TOC + 238469*TOC*Acido + 5004*TOC*Eury - 100000*Acido -

100000*Acido*TOC*TOC 0.950 < 0.001 0.912 0.002 0.922 < 0.001

arcA arcA cydB, narH 1.21*narH + 0.146*cydB + 20028857980/narH - 265934 0.950 < 0.001

0.930 < 0.001

0.935 < 0.001

arcB arcB pH, DO (14146781 - 49361*pH - 19318332*DO)/(247 - 329*DO) 0.918 0.008 0.866 0.017 0.905 0.001

cydA cydA Fe3+, Eury, arcB 68721 + 15787*Fe3+ + 1542*Eury + 0.492*arcB/(0.504 + Eury) 0.933 < 0.001

0.926 0.010

0.928 < 0.001

cydB cydB Zn, cydA, narH 129718 + 1.46*cydA + 0.764*narH*narH/cydA - 1.59*narH 0.955 < 0.001 0.945 < 0.001 0.947 < 0.001

narH narH pH, DO 165785 + (100000*pH*DO - 256025*pH*pH)/(209 - 42.3*DO) 0.890 0.002

0.877 0.079

0.886 < 0.001

narI narI Cu, Nitro 585479 + 2740*Cu*Nitro + 217156*Cu*Cu - 231782*Cu - 90.3*Nitro*Nitro -

53473*Cu*Cu*Cu 0.975 < 0.001 0.955 0.007 0.960 < 0.001

narJ narJ pH, DO 104334 + 29712*DO + 110*pH*DO*DO*DO - 5813*DO*DO 0.911 0.009

0.881 0.046

0.904 0.010

ahpC ahpC TOC, ahpF 5491077 + 313927*TOC + 3.83e-5*ahpF*ahpF - 22.2*ahpF - 2.05e-

11*ahpF*ahpF*ahpF - 170148*TOC*TOC 0.976 < 0.001 0.964 0.027 0.973 < 0.001

ahpF ahpF Eury, katE 26198241 + 8.08e-6*katE*katE - 867392672100/(katE - 1000000) - 28.1*katE 0.963 < 0.001

0.931 0.007

0.955 < 0.001

fnr fnr pH, ahpF 28.4*ahpF + 2742781560000/ahpF - 11621325 - 313758*pH - 1.52e-5*ahpF*ahpF 0.985 < 0.001 0.975 < 0.001 0.978 < 0.001

katA katA pH, DO 97242*pH + (165326- 26217*pH*pH)/DO 0.927 0.021

0.915 0.044

0.918 0.002

katE katE pH, Beta 1744706 - 104681*pH/(Beta - 36.1) - 78046*pH 0.966 < 0.001 0.956 0.016 0.964 < 0.001

oxyR oxyR pH, Fe2+, Eury 704238 + 19746*Eury + 12592*Fe2+ - 5938*pH*Eury - 155*Eury*Eury 0.975 < 0.001

0.958 0.013

0.967 < 0.001

perR perR pH, DO 31987 + 42136/(DO*DO - 5.94*DO) 0.825 0.002 0.823 0.057 0.824 < 0.001

proV proV SO42-

116643*SO42- + 986/(9 + SO4

2-*SO42- - 6*SO4

2-) - 234/(8.71 + SO42-*SO4

2- - 6*SO42-) -

99930 0.942 0.027 0.922 0.038 0.937 0.002

proX proX DO, Cu 93424 + 1762*DO - 931/(0.469*DO - 0.39) 0.911 0.001 0.909 0.029 0.909 < 0.001

phoA phoA pstC 25907 + 0.362*pstC + 9997/(87.7 + 1e-10*pstC*pstC - 0.0002*pstC) 0.980 < 0.001 0.948 0.008 0.956 < 0.001

Page 40: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

Genes Abbr. Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

phoB phoB pH, P 14430174 - 12263863/pH + 531416*pH*pH - 4765475*pH 0.959 0.015 0.951 0.041 0.953 0.010

pstA pstA Eury, pstC, pstS 1541*Eury + 3.84*pstS + 1.04*pstC - 699470 - 2.29e-6*pstC*pstS 0.982 < 0.001

0.961 0.002

0.971 < 0.001

pstB pstB pH, P 4612569 + 10199/(pH - 1.96) - 155833*pH 0.981 0.004 0.974 0.011 0.976 0.002

pstC pstC pH, Eury 627618 + 378470*pH + 27194*Eury - 7403*pH*Eury - 65401*pH*pH 0.970 < 0.001

0.966 0.042

0.967 < 0.001

pstS pstS pH, DO, P 645118 + 360948*P + 33388*DO - 100000*pH - 1845*P*DO*DO*DO 0.896 0.008 0.889 0.041 0.894 0.001

clpC clpC ctsR 400000 + 1.61*ctsR - 5911987830/ctsR - 2.17e-6*ctsR*ctsR 0.974 < 0.001

0.961 < 0.001

0.964 < 0.001

ctsR ctsR pH, TOC 296028 - 30766/TOC - 10758*pH/(13.1 - 13.8*pH*TOC) - 71532*TOC 0.912 < 0.001 0.866 0.008 0.901 < 0.001

obgE obgE Fe2+, Al, Actino 2612455 + (490435*Fe2+ - 391327)/Al - 31709*Actino - 361380*Fe2+ 0.956 < 0.001

0.953 0.018

0.954 < 0.001

ABC antibiotic

transporter ABCat pH 268357 + 7868/(3.82*pH - 7.71) - 26174*pH 0.921 0.023 0.912 0.012 0.919 0.011

MatE antibiotics MatE Cd, Actino, Nitro 389315 + 3159*Nitro + 23655/(Nitro - 11.9) + (23655*Cd - 63.7*Nitro)/Actino -

1088625*Cd*Actino 0.986 < 0.001

0.925 0.017

0.941 < 0.001

MFS antibiotics MFS SMR, Mex 47580 + 0.601*SMR + (37107 + 2.15e-8*SMR*SMR - 0.059*SMR)/(6.92 - 4.06e-

6*SMR) 0.970 < 0.001 0.954 0.018 0.966 < 0.001

SMR antibiotics SMR Pb, Acido, Beta 1741816 + 741613*Acido + 203*Acido*Beta*Beta - 21910*Acido*Beta -

206028*Acido*Acido 0.986 < 0.001

0.981 0.026

0.985 < 0.001

Mex Mex Eury, ABCat 90394 + 0.031*Eury*ABCat + 20.8*Eury*Eury*Eury - 0.08*ABCat -

0.004*ABCat*Eury*Eury 0.867 0.010 0.831 0.012 0.841 < 0.001

beta-lactamase lac Al, Alpha 76626 + 2628*Al*Al*Al + 16.1*Al*Alpha*Alpha - 1222*Alpha - 8560*Al*Al 0.907 0.010

0.861 0.014

0.895 0.003

class A beta-

lactamase lacA pH, Nitro 319145*pH + 9936*Nitro + -7123/Nitro - 234250 - 4061*pH*Nitro - 48191*pH*pH 0.953 < 0.001 0.936 0.009 0.949 < 0.001

class C beta-

lactamase lacC pH, Acido

524718 + 2144746*Acido*Acido + 241702*pH*pH*Acido*Acido - 35611*pH -

1447706*pH*Acido*Acido 0.921 < 0.001

0.919 0.042

0.921 < 0.001

Tet Tet pH 709163 + 4117/(37.6*pH - 103) 0.951 0.018 0.939 0.083 0.942 0.012

Van Van EC, Beta 21198 + (1433 + 36.7*Beta)/(EC + 0.13*Beta*Beta - 3 - 2.28*Beta) 0.835 0.018 0.746 0.053 0.769 0.002

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with

observed values divided by total number of resamples.

Page 41: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S13. Predictive equations and functional parameters that provide the best prediction for environmental properties based on the artificial neural network (ANN).

Environmental

properties Abbr.

Functional

parameters Predictive equations

Bray-Curtis similaritya

Train P valueb Validation P value Average P value

Dissolved Oxygen DO Fe2+ 1.69 + 0.024/(2.54 - Fe2+) + Fe2+/(4.17*Fe2+ - 13.7) 0.785 < 0.001

0.528 0.059

0.724 < 0.001

Total Organic Carbon TOC P (20.1*P*P - 3.97*P)/(16.8*P*P - 0.242 - 2.31*P) 0.927 < 0.001

0.829 0.002

0.853 < 0.001

Electrical Conductivity EC Fe3+ Fe3+ + 6.14/Fe3+ - 1.44 0.978 < 0.001

0.971 0.004

0.977 < 0.001

Sulfate SO42- TOC, EC 1.61 + 0.539*EC + 0.218/TOC - 0.093/(TOC*EC - 3*TOC) 0.978 < 0.001

0.965 < 0.001

0.975 < 0.001

Ferric ion Fe3+ pH, Fe2+ 2.45 + 1.61*Fe2+ - 0.562*pH*Fe2+ 0.968 < 0.001

0.963 0.003

0.967 < 0.001

Ferrous ion Fe2+ pH 7.97 + 0.474*pH*pH - 3.89*pH 0.753 < 0.001

0.734 0.011

0.744 < 0.001

Aluminum Al Fe3+ 0.761*Fe3+ + 0.092/(51.1*Fe3+ - 174) 0.897 < 0.001

0.866 0.002

0.888 < 0.001

Copper Cu Fe2+, Cd, P 8.34*Cd*P/(Cd + 0.198*Fe2+) 0.638 < 0.001

0.496 0.010

0.554 < 0.001

Zinc Zn Fe3+, Cd Fe3+ - 0.004/Cd - 0.702 0.813 < 0.001

0.802 0.016

0.805 < 0.001

Arsenic As pH, DO, Cd (0.104 + 2.34*Cd)/(pH + pH*DO - 2.75) 0.813 < 0.001

0.611 0.047

0.770 < 0.001

Cadmium Cd DO, Pb DO*Pb*Pb/(0.253 + DO*DO - DO) 0.722 < 0.001

0.567 0.053

0.652 < 0.001

Lead Pd pH, DO, Fe3+ 0.006*DO + 0.001*Fe3+*Fe3+*Fe3+*Fe3+ 0.641 0.004

0.628 0.041

0.638 0.001

Phosphorus P Al, Cd 0.048 + 0.041*Al + 5.09*Al*Cd - 8.45*Cd 0.762 < 0.001 0.748 < 0.001 0.757 < 0.001

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted environmental properties.

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by

random permutations than that with observed values divided by total number of resamples.

Page 42: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Table S14. Functional genes that reveled consistent or fluctuant relative metabolic potentials along the gradient of pH levels.

Category Subcategory Gene Abbreviations pH range

< 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0

Carbon cycling Carbon fixation

aclB aclB 90.70 69.28 55.40 50.78 50.64 53.12

CODH CODH 1.69 2.90 20.70 43.06 10.98 5.68

Pcc Pcc 39.31 38.65 35.67 25.02 7.31 93.04

RubisCo RubisCo 83.25 76.79 85.18 90.13 94.13 96.82

Nitrogen cycling Ammonification ureC ureC 65.31 65.42 65.83 67.71 69.93 66.85

Denitrification nirK nirK 62.99 43.57 44.40 55.63 65.92 89.04

Energy process Electron transport

Fe-S cluster binding protein fes 51.68 60.40 69.80 65.28 69.44 57.42

ferredoxin fer 35.28 35.05 36.67 40.66 30.93 69.42

NADH ubiquinone oxidoreductase NADH 61.68 60.40 69.80 65.28 69.44 57.42

terminal quinol oxidase quio 61.62 59.43 69.03 64.40 69.07 56.49

Hydrogenase Ni-Fe hydrogenase Nfhyd 25.32 30.54 31.02 59.18 35.24 30.64

Membrane transport EPS glycosyl transferase glyt 53.69 54.32 62.79 58.25 61.25 50.99

Metal resistance

Ag silA silA 37.01 38.40 40.48 42.32 44.98 42.45

silC silC 41.39 34.30 33.40 33.35 33.75 34.77

As arsA arsA 68.34 73.17 67.11 72.26 72.56 72.83

Cd cadBD cadBD 37.31 28.54 60.33 40.59 37.35 39.88

Cd_Co_Zn

czcA czcA 20.47 27.96 29.40 31.70 29.89 30.43

czcC czcC 64.25 55.37 76.49 71.08 71.55 71.96

czcD czcD 37.50 76.65 80.97 46.80 81.82 93.41

Cu cueO cueO 81.74 89.71 92.89 96.18 96.14 95.79

cusA cusA 81.43 89.55 92.76 96.10 96.06 95.71

Hg

mer mer 49.63 95.97 99.01 99.41 99.76 99.97

merB merB 47.61 64.54 64.63 67.61 64.90 63.49

merP merP 33.26 89.90 97.35 98.43 99.35 99.92

Ni nreB nreB 1.79 7.99 39.71 92.23 64.10 51.87

Pb pbrA pbrA 54.47 82.31 70.83 64.55 68.31 73.56

Te terD terD 53.10 23.97 55.05 60.79 33.66 7.26

Zn zitB zitB 71.92 92.73 70.69 61.81 64.09 60.32

Stress response

Cold cspB cspB 25.88 37.96 33.86 33.09 32.80 32.67

Heat

dnaK dnaK 39.81 86.46 96.86 99.07 99.84 99.98

groEL groEL 31.54 34.88 17.05 39.18 69.70 91.97

grpE grpE 89.23 93.45 94.09 73.13 16.68 5.53

hrcA hrcA 60.98 91.69 73.49 37.05 20.94 60.00

Glucose limitation

bglH bglH 5.11 18.46 31.42 37.51 46.43 50.21

cydA cydA 77.23 26.21 10.05 1.22 9.17 47.09

cydB cydB 74.23 29.90 18.21 9.63 10.60 47.96

narH narH 97.89 89.49 74.27 55.28 32.71 10.63

narI narI 46.69 78.77 99.49 97.20 72.44 52.38

narJ narJ 14.61 49.41 76.50 88.58 95.54 99.10

Oxygen stress

ahpC ahpC 82.83 78.48 82.87 84.35 86.78 88.59

ahpF ahpF 22.95 10.67 18.11 31.81 46.51 55.67

katA katA 63.86 69.28 90.33 98.81 82.59 45.69

katE katE 55.82 51.33 41.53 36.84 31.53 26.21

perR perR 19.20 61.52 87.64 96.08 99.31 99.93

Osmotic stress proX proX 82.17 66.28 89.47 96.66 99.41 99.94

Protein stress clpC clpC 64.00 67.99 76.60 76.35 76.18 76.09

ctsR ctsR 56.15 61.39 70.02 69.74 69.54 69.43

Radiation stress obgE obgE 60.19 76.87 90.13 95.71 98.62 96.95

Antibiotic resistance

Transporter Mex Mex 57.56 55.22 57.93 66.66 60.13 51.09

Beta-lactamases class C beta-lactamase lacC 24.03 21.79 12.08 3.73 24.86 59.47

other category Tet Tet 51.33 50.84 51.78 53.02 52.42 52.27

Van Van 30.01 27.33 27.49 27.46 27.45 27.45

Page 43: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Figure S1. The consensus networks of environmental (a) and taxonomic (b) variables generated by Bayesian

network inference.

Page 44: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Relative abundance (log10 (x+1))

0.0 0.5 1.0 1.5 2.0 2.5

Obs

erve

d va

lues

0.0

0.5

1.0

1.5

2.0

2.5

R2(Phylum, n=8) = 0.70

0.0 0.5 1.0 1.5 2.0 2.5

Obs

erve

d va

lues

0.0

0.5

1.0

1.5

2.0

2.5

R2 (Order, n=35) = 0.62

Predicted values0.0 0.5 1.0 1.5 2.0 2.5

Obs

erve

d va

lues

0.0

0.5

1.0

1.5

2.0

2.5

R2 (OTU, n=14) = 0.52

PhylumOrderOTU

Figure S2. The scatter plots show the cross-validation of predicted and observed values for relative microbial abundances at different taxonomic levels.

Page 45: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

4 5 6 7 8

Obs

erve

d va

lues

4

5

6

7

8

Functional metabolic potentials (log10 (x))

R2 = 0.977

5.4 5.6 5.8 6.0 6.2

5.4

5.6

5.8

6.0

6.2

R2 = 0.967

5.0 5.5 6.0 6.5 7.0

5.0

5.5

6.0

6.5

7.0

R2 = 0.952

4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2

Obs

erve

d va

lues

4.8

5.0

5.2

5.4

5.6

5.8

6.0

6.2

R2 = 0.981

5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6

5.0

5.2

5.4

5.6

5.8

6.0

6.2

6.4

6.6

R2 = 0.997

4.5 5.0 5.5 6.0 6.5 7.0 7.5

4.5

5.0

5.5

6.0

6.5

7.0

7.5

R2 = 0.996

Predicted values

5.0 5.5 6.0 6.5 7.0 7.5 8.0

Obs

erve

d va

lues

5.0

5.5

6.0

6.5

7.0

7.5

8.0

R2 = 0.956

Predicted values 4 5 6 7

4

5

6

7

R2 = 0.998

Predicted values

4 5 6 7 8

4

5

6

7

8

R2 = 0.966

Carbon cyclingNitrogen cyclingPhosphorusSulfur cyclingEnergy processMembrane transportAntibiotic resistanceMetal resistanceStress response

Figure S3. The scatter plots show the cross-validation of predicted and observed values for functional metabolic potentials of different functional gene categories.

Functional metabolic potentials (log10 (x)) Functional metabolic potentials (log10 (x))

Page 46: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Euryarc

haeo

ta

Acidob

acter

ia

Actino

bacte

ria

Firmicu

tes

Nitrosp

ira

Alphap

roteo

bacte

ria

Betapro

teoba

cteria

Gammap

roteo

bacte

ria

Bra

y-C

urtis

sim

ilarit

y

0

60

80

100 TrainValidationAverage

70

90

All com

munity

Carbon

cyclin

g

Nitroge

n cyc

ling

Phosp

horus

cyclin

g

Sulfur

cyclin

g

Energy

proc

ess

Membra

ne tra

nspo

rt

Metal re

sistan

ce

Stress

resp

onse

Antibio

tic re

sistan

ce

Bra

y-C

urtis

sim

ilarit

y

0

80

100

90

85

95

a

b

Figure S4. Bray-Curtis similarity between predicted and observed values of relative microbial abundances (phylum level, a) and gene metabolic potentials of different functional categories (with relative abundance information of microbial phyla, b). The similarity of the overall microbial community composition was calculated based on these eight microbial phyla. Average includes the data sets for training and validation. Values are mean ± SE and the significances of the similarity were listed in supplementary tables.

Page 47: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

Color Key: < 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0 (pH range)

fero

cyt

hyd

ABCt

fero

cyt

hyd

ABCt

fero

cyt

hyd

ABCt

100

80

60

40

20

ABCat

MatE

MFS

SMR

lac

lacA

ABCat

MatE

MFS

SMR

lac

lacA

ABCat

MatE

MFS

SMR

lac

lacA

100

80

60

40

20

cspAgroES

bglP

glnA

glnRarcA

arcB

fnr

oxyR

proV

cspAgroES

bglP

glnA

glnRarcA

arcB

fnr

oxyR

proV

cspAgroES

bglP

glnA

glnRarcA

arcB

fnr

oxyR

proV100

80

60

40

20

a b

c d

sox

aprA

aprBdsrA

dsrB

sox

aprA

aprBdsrA

dsrB

sox

aprA

aprBdsrA

dsrB

100

80

60

40

20

Figure S5. The changes of relative metabolic potential of functional genes in sulfur cycling (a), stress response (b), energy process and membrane transport (c) and antibiotic resistance (d) along the gradient of pH levels. The metabolic potentials were normalized to relative values.

Page 48: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

aclB

5.35

5.40

5.45

CODH

5.8

6.2

6.6

7.0M

ean

sign

al o

f met

abol

ic p

oten

tials

(log

10(x

))

Pcc

5.6

6.4

7.2

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

RubisCo

5.75

5.80

5.85

Observed valuesPredicted values

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

aM

ean

sign

al o

f met

abol

ic

pote

ntia

ls (l

og10

(x))

phytase

4.7

4.9

5.1

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

ppk

5.85

5.90

5.95

6.00

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Observed valuesPredicted values

ppx

5.90

5.95

6.00

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

b

aprA

5.55

5.60

5.65

5.70aprB

5.1

5.2

5.3

5.4 dsrA

6.23

6.25

6.27

dsrB

6.05

6.10

6.15

Observed valuesPredicted values

sox

5.7

5.8

5.9

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

c

Figure S6. The comparison of predicted and observed metabolic potentials of different functional gene categories including carbon cycling (a), phosphorus (b) and sulfur cycling (c) along the gradient of pH levels. Values were mean ± SE.

Page 49: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

nifH

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

6.54

6.57

6.60

6.63gdh

5.43

5.46

5.49

5.52ureC

5.91

5.94

5.97

6.00

amoA

6.24

6.30

6.36narG

6.15

6.18

6.21

nirK

5.85

5.88

5.91

5.94

nirS

5.85

5.91

5.97 norB

5.12

5.20

5.28

5.36 nosZ

5.28

5.36

5.44

5.52

nasA

5.40

5.48

5.56NiR

5.32

5.40

5.48 nirA

4.9

5.0

5.1

5.2

nirB

4.65

4.95

5.25 napA

5.65

5.70

5.75

nrfA

5.6

5.7

5.8

5.9

Observed valuesPredicted values

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S7. The comparison of predicted and observed metabolic potentials of nitrogen cycling along the gradient of pH levels. Values were mean ± SE.

Page 50: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

fes

6.77

6.78

6.79

6.80fer

4.6

4.8

5.0 fero

4.98

5.04

5.10

NADH

7.08

7.09

7.10

7.11quio

7.34

7.35

7.36

7.37cyt

5.9

6.0

6.1

hyd

5.2

5.3

5.4

5.5

Observed valuesPredicted values

Nfhyd

4.9

5.1

5.3

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

glyt

5.15

5.20

5.25

5.30ABCt

7.69

7.70

7.71

7.72

Mea

n si

gnal

of m

etab

olic

pote

ntia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

a

b

Figure S8. The comparison of predicted and observed metabolic potentials of different functional gene categories including energy process (a) and membrane transport (b) along the gradient of pH levels. Values were mean ± SE.

Page 51: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

silA

4.55

4.60

4.65

4.70silP

4.80

4.90

5.00 al

5.48

5.50

5.52

5.54

aoxB

5.42

5.44

5.46

arsA

4.0

4.4

4.8

arsB

4.9

5.0

5.1

5.2 arsC

6.0

6.1

6.2

arsM

7.90

7.94

7.98cadA

5.94

6.00

6.06

6.12 cadBD

4.88

4.96

5.04

5.12 czcA

6.10

6.12

6.14

6.16

czcC

4.8

4.9

5.0

5.1czcD

5.0

5.5

6.0corC

5.1

5.2

5.3

5.4chrA

6.05

6.10

6.15

6.20

copA

6.36

6.38

6.40

6.42 cueO

4.0

4.2

4.4

4.6 cusA

4.4

4.6

4.8mer

5.7

5.8

5.9

merB

4.45

4.55

4.65

merP

4.8

4.9

5.0

5.1nreB

4.9

5.1

5.3pbrA

4.3

4.5

4.7

pbrT

4.5

4.7

4.9 tehB

5.70

5.75

5.80terC

5.88

5.89

5.90

5.91terD

5.88

5.91

5.94

terZ

5.1

5.3

5.5

Observed valuesPredicted values

zitB

4.77

4.83

4.89 zntA

5.79

5.82

5.85

5.88

silC

5.58

5.61

5.64

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S9. The comparison of predicted and observed metabolic potentials of metal resistance along the gradient of pH levels. Values were mean ± SE.

Page 52: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

cspA

4.5

4.6

4.7

cspB

4.35

4.40

4.45

4.50 dnaK

5.66

5.70

5.74groEL

5.4

5.5

5.6

5.7groES

4.4

4.6

4.8

5.0

grpE

5.85

5.90

5.95

hrcA

5.95

6.00

6.05

bglH

4.5

4.6

4.7

bglP

4.75

4.85

4.95glnA

6.28

6.32

6.36

glnR

5.35

5.45

5.55arcA

4.85

4.90

4.95

5.00arcB

4.72

4.76

4.80

4.84cydA

5.04

5.12

5.20

5.28 cydB

5.28

5.34

5.40

narH

5.13

5.19

5.25narI

5.70

5.73

5.76

5.79narJ

5.04

5.10

5.16

5.22 ahpC

6.16

6.18

6.20

6.22ahpF

5.60

5.65

5.70

5.75

fnr

6.45

6.55

6.65katA

5.30

5.40

5.50katE

6.12

6.16

6.20

6.24 oxyR

5.84

5.88

5.92

perR

4.20

4.35

4.50

4.65

proV

5.48

5.52

5.56

proX

4.88

4.96

5.04phoA

5.60

5.68

5.76phoB

5.88

5.92

5.96

pstA

5.92

6.04

6.16

pstB

6.60

6.66

6.72

pstC

6.00

6.05

6.10

6.15pstS

5.5

5.7

5.9 clpC

5.74

5.78

5.82ctsR

5.20

5.28

5.36

5.44

obgE

6.30

6.35

6.40

Observed valuesPredicted values

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S10. The comparison of predicted and observed metabolic potentials of stress response along the gradient of pH levels. Values were mean ± SE.

Page 53: Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction model of microbial assemblages and functional metabolic potentials Table S1 Functional

ABCat

5.20

5.28

5.36 MatE

5.56

5.60

5.64MFS

6.03

6.06

6.09

SMR

6.22

6.24

6.26

6.28

Mex

4.65

4.80

4.95

5.10 lac

4.68

4.80

4.92

lacA

5.35

5.40

5.45

5.50 lacC

5.60

5.68

5.76

5.84 Tet

5.82

5.85

5.88

Van

4.08

4.32

4.56

Observed valuesPredicted values

Mea

n si

gnal

of m

etab

olic

pot

entia

ls (l

og10

(x))

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S11. The comparison of predicted and observed metabolic potentials of antibiotic resistance along the gradient of pH levels. Values were mean ± SE.