Metagenome-wide association analysis identifies microbial ...10.1038... · based on two-sided Wilcoxon test comparison between recoverers (n=113) and non-recoverers (n=90). s eroides

Articleshttps://doi.org/10.1038/s41559-020-1236-0

Metagenome-wide association analysis identifies microbial determinants of post-antibiotic ecological recovery in the gutKern Rei Chng1,11, Tarini Shankar Ghosh1,2,11, Yi Han Tan3,11, Tannistha Nandi1,11, Ivor Russel Lee3, Amanda Hui Qi Ng1, Chenhao Li 1, Aarthi Ravikrishnan 1, Kar Mun Lim1, David Lye3,4,5,6, Timothy Barkham 6, Karthik Raman 7,8,9, Swaine L. Chen 1, Louis Chai3,10, Barnaby Young 4,5,6 ✉, Yunn-Hwen Gan 3 ✉ and Niranjan Nagarajan 1,3 ✉

1Genome Institute of Singapore, Singapore, Singapore. 2APC Microbiome Ireland, University College Cork, Cork, Ireland. 3Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore. 4National Centre for Infectious Disease, Singapore, Singapore. 5Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore. 6Tan Tock Seng Hospital, Singapore, Singapore. 7Department of Biotechnology, Bhupat and Jyoti Mehta School of Biological Sciences, Indian Institute of Technology (IIT) Madras, Chennai, India. 8Initiative for Biological Systems Engineering, IIT Madras, Chennai, India. 9Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI), IIT Madras, Chennai, India. 10Division of Infectious Diseases, University Medicine Cluster, National University Health System, Singapore, Singapore. 11These authors contributed equally: Kern Rei Chng, Tarini Shankar Ghosh, Yi Han Tan, Tannistha Nandi. ✉e-mail: [email protected]; [email protected]; [email protected]

SUPPLEMENTARY INFORMATION

In the format provided by the authors and unedited.

NATuRe eCoLoGY & evoLuTIoN | www.nature.com/natecolevol

http://orcid.org/0000-0003-1182-6804

http://orcid.org/0000-0003-2391-0202

http://orcid.org/0000-0003-0975-2244

http://orcid.org/0000-0002-9311-7093

http://orcid.org/0000-0002-0107-2861

http://orcid.org/0000-0003-1010-2230

http://orcid.org/0000-0001-9881-6881

http://orcid.org/0000-0003-0850-5604

mailto:[email protected]



http://www.nature.com/natecolevol

Supplementary fig. 1: Properties of microbiome recovery across cohorts. (a) Cumulative density function for Simpson diversity in

the CA and SG cohorts, highlighting the large number of low diversity samples. (b) Microbiomes of recoverers are more similar to

control microbiomes than for non-recoverers (two-sided Wilcoxon test; n=16 [EN], n=32 [SW] recoverers and n=24 [EN], n=23

[SW] for non-recoverers). Jensen-Shannon (JS) divergence and Jaccard distances for each sample were computed in comparison to

the untreated (“control”) microbiomes in each cohort. The figures show the median values for each sample in the form of a

boxplot. Boxplots are represented with center line: median; box limits: upper and lower quartiles; box whiskers represent 1.5×

interquartile range or the maximum/minimum data point within the range.

Jacc

ard

dis

tan

ce

0.7

0.8

0.9

Non-recoverers Recoverers

P < 3.8e-12

0.6

0.7

0.8

0.9

Non-recoverers Recoverers

P < 7.1e-7

20

30

40

JS D

ive

rge

nce

P < 4.6e-11

Non-recoverers Recoverers Non-recoverers Recoverers

P < 0.003

15

20

25

30

ENGLAND SWEDEN

CANADA Fr

acti

on

of

Sam

ple

s

SINGAPORE

Simpson Diversity

a b

Supplementary fig. 2: Enrichment of RABs during different stages of antibiotic treatment. Fold change was

computed for median abundance in recovered vs non-recovered subjects per cohort and averaged across all 4

cohorts. Groups were determined manually (due to limited dimensionality) based on approximate trends and

taxonomic similarity. The symbols “*”, “**” and “***” indicate p-values <0.1, <0.05 and <0.01, respectively

based on two-sided Wilcoxon test comparison between recoverers (n=113) and non-recoverers (n=90).

Ali

sti

pes

pu

tre

din

is

Ba

cte

roid

es

ste

rco

ris

Ba

cte

roid

es t

heta

iota

om

icro

n

Ba

cte

roid

es u

nif

orm

is

Pa

rab

acte

roid

es d

ista

so

nis

Su

bd

oli

gra

nu

lum

vari

ab

ile

Ali

sti

pes

sh

ah

ii

Bacte

roid

es c

accae

Ba

cte

roid

es

eg

gert

hii

Bif

ido

bac

teri

um

ad

ole

sc

en

tis

Ro

se

bu

ria

in

uli

niv

ora

ns

Ba

cte

roid

es

co

pro

co

la

Ru

min

oc

oc

cu

s b

rom

ii

Ru

min

oc

oc

cu

s t

orq

ue

s

Bif

ido

bac

teri

um

lo

ng

um

Co

pro

co

ccu

s c

atu

s

Ba

cte

roid

es in

tes

tin

alis

Fae

ca

lib

ac

teri

um

pra

us

nit

zii

Pa

rab

acte

roid

es

jo

hn

so

nii

Bif

ido

bac

teri

um

bif

idu

m

De

su

lfo

vib

rio

pig

er

Supplementary fig. 3: Differentially abundant metagenomic functions in post-antibiotic recovery. Functional pathways enriched in the gut microbiomes of recoverers (n=17) or non-recoverers (n=12) (of the SG cohort) in the ‘Pre/Early’ and ‘During’ stages of antibiotic treatment. Note that a star (‘*’) indicates those pathways for which significant (p-values<0.05) differences were also obtained in the CA cohort. p-values were computed using the KW-rank sum test implemented within the LefSe package. Pathways were grouped into those important for energy production (in orange) and those involved in biosynthesis (in blue), highlighting the role of these two processes in microbiome recovery.

Amino acid Biosynthesis

Nucleotide Biosynthesis

Co-factor Biosynthesis

Energy Production

Carbohydrate Degradation

Cell-wall Biosynthesis

*

*

* *

* *

* *

*

* *

* * *

* *

*

LDA Score (Log 10)

High in Recoverers High in Non-recoverers

Butanoate Metabolism Lo

g(R

elat

ive

Ab

un

dan

ce)

-2.6

-2.4

-2.2

-2.0

-1.8 P < 0.03

Carbohydrate Metabolism

-3.2

-3.0

-2.8

-2.6

-2.4

-2.2

-2.0

P < 0.02

Log(

Rel

ativ

e A

bu

nd

ance

)

Supplementary fig. 4: Enrichment of Carbohydrate Metabolism and Butanoate Metabolism pathways in the gut microbiomes

of recoverers in the EN and SW cohorts. Abundances of the various pathways in the samples belonging to the Pre/Early and

During stages of treatment were inferred using PICRUSt and then compared among the recoverers (n=8 [EN], n=16 [SW]) and

non-recoverers (n=11 [EN], n=12 [SW]) in these cohorts. The total-sum-scaled abundances were log-normalized and compared

using two-sided Wilcoxon test. Boxplots are represented with center line: median; box limits: upper and lower quartiles; box

whiskers represent 1.5× interquartile range or the maximum/minimum data point within the range.

Recoverers Recoverers Non-recoverers Non-recoverers

Supplementary fig. 5: Enrichment of Bacterial Genera in the Resistome. Reads belonging to the resistome were assigned to

bacterial genera using Kraken (right panel) and odds ratio between groups computed to identify enriched genera (left panel; * = 𝜒2

test p-value <0.05, pre- and during antibiotic timepoints). Genera with RAB species are highlighted in green. The comparisons

were performed for the "Pre/Early" and "During" samples belonging to the SG and CA cohorts (n=17 [SG], 16 [CA] recoverers;

n=12 [SG], 14 [CA] non-recoverers).

Tannerella*

Bacteroides

Alistipes

Enterobacter

Riemerella

Clostridium

Streptococcus

Enterococcus

Escherichia

Klebsiella

Corynebacterium

High in Recoverers High in Non-recoverers

Log (Odds Ratio)

Recoverers Non-recoverers

Proportion in Resistome

Cluster 5 Cluster 4

Cluster 3 Cluster 2 Cluster 1

Non-RABs RABs

Supplementary fig. 6: RABs have distinct preferences for carbohydrate active enzyme families. Copy numbers of CAZymes having

specificities for different categories of carbohydrates were counted for various bacterial species and then range normalized across

species. Four distinct large clusters could be observed among the species based on their carbohydrate degradation specificities.

RABs were observed to be significantly enriched in cluster 1 (Fisher’s exact test; p-value<0.001) that is defined by an abundance of

enzymes that are specific to plant/animal carbohydrates as well as mucin.

Supplementary fig. 7: Key metabolic interactions between RABs. Directed lines indicate RAB species with high metabolic support to other RAB species (top 10% of MSI values). Node sizes reflect the number of incoming edges and the red edge marks the interaction between B. thetaiotamicron and B. adolescentis which was evaluated further in an in vivo model for microbiome recovery.

Supplementary fig. 8: Microbiome recovery profiles across treatment groups. (a) Microbial biomass (median ± 1 MAD) values obtained after normalizing by host reads reveal similar trajectories as plant normalized values (Figure 4b). Stars (‘**’) indicate timepoints where the Bt and Bt+Ba groups were significantly different from other groups (one-sided Wilcoxon test p-value < 0.01). (b) Median Bray-Curtis distance of species level taxonomic profiles compared to day 0 profiles, in different treatment groups and across time (median ± 1 MAD). Stars (‘**’) indicate timepoints where the Bt group was significantly different from other groups (one-sided Wilcoxon test p-value < 0.01). For all subfigures, vehicle: n=5, Ba: n=6, Bt: n=2, and Bt+Ba: n=2, where n represents cage units.

a

b

** ** ** ** **

Mic

rob

ial B

iom

ass

No

rmal

ise

d b

y H

ost

Days since start of experiment

1000

100

10

0 3 6 10 13 16 19 7 22

Days since start of experiment

Div

erg

en

ce f

rom

init

ial

stat

e

** ** ** **

3 6 10 13 16 19 7 22

0.9

0.7

0.5

Supplementary fig. 9: Successful colonization of B. thetaiotaomicron in the mouse gut microbiome post gavage. Boxplots showing high number of B. thetaiotaomicron metagenomic reads from mouse stool after Bt gavage, but not Bacillus spp. reads after Bacillus gavage (Bc group), indicating successful colonization specific to Bt. Boxplots are represented with center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range. Ba: n=18 (pre-gavage), 36 (post-gavage) samples; Bt: n=6 (pre-gavage), 12 (post-gavage) samples; Bt+Ba: n=6 (pre-gavage), 12 (post-gavage) samples; Bc: n=18 (pre-gavage), 36 (post-gavage) samples.

# R

ead

s

Ba

Bt

Bt + Ba

Bc

Supplementary fig. 10: Placement of RABs in the food web at different thresholds. Heatmap showing that at different thresholds (±50% from the threshold of 0.01 used for results in Figure 3a), the position of RABs as primary, secondary and tertiary species in the food-web is retained.

Supplementary fig. 11: Establishing validity of microbial biomass estimation using host normalized microbial read counts. (a) 16S rRNA qPCR demonstrates that the fold change in 16S rRNA copies is directly proportional to fold change in microbial biomass (CFUs), as expected. (b) Metagenomic analysis demonstrate that the fold change in host-normalized microbial reads is directly proportional to fold change in microbial biomass (CFUs). DNA from cultures of Klebsiella pneumoniae and Enterococcus faecium were mixed in equal CFU ratio, and mouse stool DNA samples were spiked in at various dilutions (1:1 to 1:1000) to achieve a wide-range of fold changes. Data shown is for two stool samples (biological replicates).

a b

1 10 100 1000

1000

100

10

Fold Change in CFUs (Biomass)

Fold

Ch

ange

in 1

6S

rRN

A

(qP

CR

qu

anti

tate

d)

1

Fold Change in CFUs (Biomass)

1 10 100 1000

1000

100

10

1

Fold

Ch

ange

in h

ost

-n

orm

aliz

ed

mic

rob

ial r

ead

s

Barcode adapter, double

stranded

1st strand: 5'P-GATCGGAAGAGCACACGTCT

2nd strand: 5'ACACTCTTTCCCTACACGACGCTCTTCCGATCT

PE 1.0 5’AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC

ACGACGCTCTTCCGATC* T

Index Primer 5’CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTG

GAGTTCAGACGTGTGCTCTTCCGATC*T

16S Forward 5’ACTCCTACGGGAGGCAGC

16S Reverse 5’TTACCGCGGCTGCTGGCAC

gBLOCK:

5’GGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTC

GGCAATGGACGGAAGTCTGACCGAGCAACGCCGCGTGAG

TGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTAAGAGAA

GAACGAGTGTGAGAGTGGAAAGTTCACACTGTGACGGTAT

CTTACCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGC

GGTAATACGTAGGTCCCGAG

Supplementary Table 1: Primers and adapter sequences used in this study.

Supplementary Note 1: Machine learning models to predict recovery status

To test the ability to infer recovery status using microbial abundances before antibiotic

treatment (with and without cohort labels; only microbes with mean relative abundance >

0.5% were used), we attempted to build a classifier with various machine learning models,

including random forest (R package “randomForest”), linear discriminant analysis (R

package “MASS”), sparse logistic regression (R package “glmnet”), and conditional

inference tree (R package “ctree”). The models were evaluated with default parameters

using leave-one-out cross validation (R package “caret”) and the accuracy for the best

model (conditional inference tree) was reported.

Documents

Metagenome-wide association analysis identifies microbial ...10.1038... · based on two-sided Wilcoxon test comparison between recoverers (n=113) and non-recoverers (n=90). s eroides