Computational metagenomics and the human microbiome

Preview:

DESCRIPTION

Computational metagenomics and the human microbiome. Curtis Huttenhower 01-21-11. Harvard School of Public Health Department of Biostatistics. What to do with your metagenome?. Reservoir of gene and protein functional information. Comprehensive snapshot of microbial ecology and evolution. - PowerPoint PPT Presentation

Citation preview

Computational metagenomics andthe human microbiome

Curtis Huttenhower

01-21-11Harvard School of Public HealthDepartment of Biostatistics

2

What to do with your metagenome?

(x1010)

Diagnostic or prognostic

biomarker for host disease

Public health tool monitoring

population health and interactions

Comprehensive snapshot of

microbial ecology and evolution

Reservoir of gene and protein

functional informationWho’s there?

What are they doing?

What do functional genomic data tell us about microbiomes?

What can our microbiomes tell us about us?*

*Using terabases of sequence and thousands of experimental results

3

The Human Microbiome Project

2007 - ongoing

• 300 “normal” adults, 18-40

• 16S rDNA + WGS• 5 sites/18 samples +

blood• Oral cavity: saliva, tongue,

palate, buccal mucosa, gingiva,

tonsils, throat, teeth• Skin: ears, inner elbows• Nasal cavity• Gut: stool• Vagina: introitus, mid, fornix

• Reference genomes (~200+800)

All healthy subjects; followup projects in psoriasis, Crohn’s,

colitis, obesity, acne, cancer, antibiotic

resistant infection…

Hamady, 2009

Kolenbrander, 2010

4

HMP Organisms: Everyone andeverywhere is different

← Body sites + individuals →

← O

rgan

ism

s (ta

xa) →

ear gut nose mouth vaginaarmmucosa palate gingiva tonsils saliva sub. plaq. sup. plaq. throat tongue

Every microbiome is surprisingly different

Most organisms are rare in most places

Even common organisms vary tremendously in abundance

among individuals

Aerobicity, interaction with the immune system, and

extracellular medium appear to be major determinants

There are few organismal biotypes

in health

5

HUMAnN: Community metabolic and functionalreconstruction

WGS reads

Pathways/modules

Genes(KOs)

Pathways(KEGGs)

Functional seq.KEGG + MetaCYC

CAZy, TCDB,VFDB, MEROPS…

BLAST → Genes

rra

r

raa

p

gap

ggc

)(

)(

1

)()1(

||1)(

Genes → PathwaysMinPath (Ye 2009)

SmoothingWitten-Bell

otherwiseTNNgcgcTNTVTN

gc)/()(

0)()/()/()(Gap filling

c(g) = max( c(g), median )

300 subjects1-3 visits/subject~6 body sites/visit

10-200M reads/sample100bp reads

BLAST

?Taxonomic limitation

Rem. paths in taxa < ave.

XipeDistinguish zero/low

(Rodriguez-Mueller in review)

HMPUnifiedMetabolicAnalysisNetwork

6

HUMAnN: Community metabolic and functionalreconstruction

Pathway coverage Pathway abundance

7

HUMAnN: Validating gene and pathwayabundances on synthetic data

Validated on individual genes, module coverage + abundance

• False negatives: short genes (<100bp),

taxonomically rare pathways • False positives: large and multicopy

(not many in bacteria)

8

HUMAnN: The steps that didn’t make the cut

Abundance

Coverage

9

Functional modules in 741 HMP samples

Coverage

Abundance

ANO(BM)PF O(SP)S RCO(TD)← Samples →

← P

athw

ays→

• Zero microbes (of ~1,000)

are core among body sites• Zero microbes are core

among individuals• 19 (of ~220) pathways are

present in every sample• 53 pathways are present in

90%+ samples

• Only 31 (of 1,110) pathways

are present/absent from

exactly one body site• 263 pathways are

differentially abundant in

exactly one body site

10

Microbial environment trumpshost environment (in health)

HMP stool, colored by BMI MetaHIT stool, colored by IBD

← M

icro

bes→

← P

athw

ays→

Aerobic body sites

Gastrointestinal body sites

Pathways in all body sites (“core”) • Human microbiomestructure dictated

primarilyby microbial niche,

nothost (in health)

• Huge variation in who’s

there; small variation in

what they’re doing• Note: definitely variation in

how these functions are

implemented• Does not yet speak to

environment (diet!),genetics, or disease

11

GeneexpressionSNPgenotypes

Metagenomic biomarker discovery

Healthy/IBDBMIDiet

Taxa &pathways

Batch effects?Populationstructure?

Niches &Phylogeny

Test for correlates

Multiplehypothesiscorrection

Featureselection

p >> n

Confounds/stratification/environment

Cross-validate

Biological story?

Independent sample

Intervention/perturbation

12

LEfSe: Metagenomic classcomparison and explanation

LEfSe

http://huttenhower.sph.harvard.edu/lefse

Nicola Segata

LDA +Effect Size

13

LEfSe: Evaluation on synthetic data

14

Microbes characteristic of theoral and gut microbiota

Aerobic, microaerobic and anaerobic communities

• High oxygen:skin, nasal• Mid oxygen:vaginal, oral• Low oxygen:gut

16

LEfSe: The TRUC murine colitis microbiotaWith Wendy Garrett

17

MetaHIT: The gut microbiome and IBD

WGS reads

Pathways/modules

124 subjects: 99 healthy21 UC + 4 CD

ReBLASTed against KEGG since published data

obfuscates read counts

Taxa

PhymmBrady 2009

Genes(KOs)

Pathways(KEGGs)

Qin 2010

With Ramnik Xavier, Joshua Korzenik

18

MetaHIT: Taxonomic CD biomarkers

Firmicutes

Enterobacteriaceae

Up in CDDown in CD

UC

19

MetaHIT: Functional CD biomarkers

Motility Transporters Sugar metabolism

Down in CD

Up in CD

Subset of enriched modules in CD patientsSubset of enriched pathways in CD patients

Growth/replication

20

• Sleipnir C++ library for computational functional genomics

• Data types for biological entities• Microarray data, interaction data, genes and gene sets,

functional catalogs, etc. etc.• Network communication, parallelization

• Efficient machine learning algorithms• Generative (Bayesian) and discriminative (SVM)

• And it’s fully documented!

Sleipnir: Software forscalable functional genomics

Massive datasets require efficientalgorithms and implementations.

It’s also speedy: microbial data integration

computationtakes <3hrs.

http://huttenhower.sph.harvard.edu/sleipnirhttp://huttenhower.sph.harvard.edu/lefsehttp://huttenhower.sph.harvard.edu/humann

21

Thanks!

Jacques IzardWendy Garrett

Pinaki SarderNicola Segata

Levi Waldron LarisaMiropolsky

Interested? We’re recruiting students and postdocs!

Human Microbiome Project

HMP Metabolic Reconstruction

George WeinstockJennifer WortmanOwen WhiteMakedonka MitrevaErica SodergrenVivien Bonazzi Jane PetersonLita Proctor

Sahar AbubuckerYuzhen Ye

Beltran Rodriguez-MuellerJeremy ZuckerQiandong Zeng

Mathangi ThiagarajanBrandi Cantarel

Maria RiveraBarbara Methe

Bill KlimkeDaniel Haft

Ramnik Xavier Dirk Gevers

Bruce Birren Mark DalyDoyle Ward Eric AlmAshlee Earl Lisa Cosimi

Sarah Fortune

http://huttenhower.sph.harvard.edu/

23

The LEfSe algorithm

Statisticalconsistency

Biologicalconsistency

Overalleffect size

24

HMP: Metabolism, host-microbiome interactions, and microbial taxa

>3200 gene families differential in the

mucosa

>1500 upregulated outsidethe mucosa and not in any

Actinobacterial genome

16S

WGS

Recommended