PacBio Meets the Microbiome

PacBio Meets the Microbiome

George WeinstockPacBio Users Group MeetingSeptember 18, 2013

Diverse interest in medical metagenomics• Acne • Antibiotics, gut microbiome, and obesity• Antibiotic resistance• Asthma, allergies

• Acute RSV infection• Vitamin D

• Bacterial vaginosis• Cancer microbiomes• Conjunctiva – trachoma microbiome• Crohn's disease• Cystic fibrosis• Diabetes

• Oral microbiome• Skin microbiome

• Dietary effects on gut microbiome• Fecal transplant• HIV and lung microbiome• Infection control

• C. difficile• VRE• MRSA• E. coli O157 H7• NICU bacteremia

• Intestinal fat uptake

• Necrotizing enterocolitis• Non-Alcoholic Fatty Liver Disease• Oral microbiome

• Periodontitis• Caries

• Parasitic infection and the microbiome• Post-transplant Lymphoproliferative

Disorders • Pre-term birth

• Maternal microbiome• Vitamin D

• Respiratory microbiome• Influenza infection• Pre-term babies• Childhood vaccination

• Sepsis• ICU• NICU

• Short-bowel syndromes• Urethritis• Virus discovery

• Kawasaki Disease• Fever of unknown origin in children • Transplantation: CMV, BK• Immuno-suppression/-compromised

Approaches to study the microbiome

Microbial Communit

yBacteria Viruses

Eukaryotes

Targeted Sequencing16S rRNA

Shotgun Sequencing

Bacterial censusTaxa & Abundances

All microbesTaxa & Genes

Describe communities in many samples“Average” community and variations

BacteriaVirusesFungiYeastsProtistsEnzymes

Major “enterotypes” of the stool biomes

womenmen

St. LouisHouston

BMI>=30

BMI<25NA

not hispanic/latino/spanishHispanic/latino/spanish

25 <= BMI <30

Studying communities - 16S rRNA genes

Each row a different sample

Histograms of genera in each sample

Bacteroides

Prevotella

Ruminococcus

Some Metagenomic Effects

Community Structure

e.g. content;ecological

parameters (biodiversity)

Specific Organis

me.g. C. difficile

Multiple Specific Organis

msbeneficial ↓

detrimental↑

Genes or Pathway

se.g. lactic acid

Community, organism, or ensemble properties

Metagenomic pathogen detection in clinical samples

Patient samplesHospital

microbiology lab

Metagenomic

sequencing

Compare results

Alexis ElwardDavid HaslamGreg StorchRana ElfeghalyYanjiao ZhouKristine Wylie

Patients with/without hospital acquired diarrhea

A. C.diff+high TcdBB. NC (SE meds)C. C.diff +high TcdBD. NC IBDE. C.diff + low TcdBF. NC CampyG. C.diff +low TcdBH. NCI. NC SalmonellaJ. C.diff +average TcdB

Diagnostic lab results

16S analysis of clinical samples for C. difficile

NC: various negative controls

Subject Clinical findings C. difficile Campylobacter Salmonella

A C dif + high TcdB + Noro II CT 24 7.15 0.64 0.00

B NC (SE meds?) 0.02 0.00 0.00

C C dif + high TcdBc+ Sapo CT 35 45.40 0.00 0.00

D NC IBD 0.00 0.00 0.01

E C dif +low TcdB 0.90 0.00 0.00

F NC Campy + Sapo CT 34 0.00 6.24 0.00

G C dif +low TcdB 0.10 0.00 0.01

H NC ? 0.00 0.05 0.00

I NC Salmonella 0.05 0.00 5.02

J C dif + average TcdB 2.12 0.02 2.58

Pathogen relative abundance in clinical samples16S read abundance

The bacterial 16S rRNA gene (ssu)

Evaluation of 16S rDNA-based community profiling for human microbiome research.Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.

Trends in 16S rRNA gene sequencing

Full-length Sanger sequencingPCR => clone => sequenceAll 9 hypervariable regions

1/3-length 454 sequencingPCR 500bp regions => sequence

2-4 hypervariable regions

1/10-length Illumina sequencingPCR 500bp regions => sequence

1 hypervariable region

ExpensiveTime-consumingAccurate taxa ID

InexpensiveHigh-throughputLess accurate taxa ID

Very cheapVery high-throughputLess accurate taxa ID

Full-length PBPCR => sequence9 hypervariable

regions

Full-length 16S CCS sequencing on single organisms

Organism Length (reference or cluster)

% identity

Enterococcus faecalis

1543 99.6

Staphylococcus aureus

1537 99.9

Escherichia coli 1528 99.7

Rhodobacter sphaeroides

1456 99.9

Large-scale single isolate typing

• Have ~8000 isolates (microtiter plates) from hospital• Looking for unsequenced species from humans

• Need FL 16S in order to make a species call for typing• 400 base reads from 454 do not give enough specificity

• Each well has one strain • Sanger seq’ing of FL PCR products => single sequence w/o

cloning

• Can PacBio compete: cheaper, higher throughput?

• Goal: • Find what species these isolates are• Choose novel isolates• Perform WG sequencing

Large-scale single isolate typing with PacBio

• Sanger: do not see alleles of multiple 16S genes/strainPacBio: can see different alleles since single molecule

• Hospital isolates (82):• 70 samples agree between Sanger and PacBio• 4 samples have minor species seen with both platforms• 5 samples have strain differences seen with PacBio, not w

Sanger• 2 samples failed with Sanger, not w PacBio• 1 sample disagreement• 7 DNA sample controls agree between platforms• 4 known culture sample controls agree between platforms

• PacBio: can see low level contaminants• 99% agreement between Sanger and PacBio (90/91)

• Only 1 disagreement between the platforms• More information from PacBio

Cost is an issue

• With 96 samples/1 SMRT cell, the fully loaded cost of PacBio is about 2x Sanger.• SMRT cell• Sequencing reagents• Library kit and labor• Instrument• Computation (storage, labor, cpu)

• Would need to pool more samples/SMRT cell• Need more bar codes

Sequencing communities of microbes en masse

• 16S rRNA gene sequencing for community profiling• Full-length gives species-level definition• 454 500bp reads give genus-level definition

• Shotgun sequencing• Longer reads give better assembly (of unknown uncultured)• Bacteria, viruses, fungi and other eukaryotes described

Simulated community 16S sequencing

• A mock community of 24 species • Only 22 amplified with the primers used • Organisms range over 300-fold in abundance• Make 4 different batches

• Aim for 5000 sequences/sample (454 protocol)

Pool 1 Pool 2 Pool 3 Pool 4

Reads after filtering

3557 5055 10331 9798

Species found

20 21 22 22

% reads hitting species

99.9 99.9 92.0 90.8

Mock community analysis with Sanger, 454

Evaluation of 16S rDNA-based community profiling for human microbiome research.Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.

Consistent recognition of an organism in the pool for 4 replicate 16S amplifications:

Actinomyces_odontolyticusBacteroides_plebeius

Bifidobacterium_dentiumCitrobacter_youngae

Clostridium_nexileCollinsella_stercoris

Dialister_invisusEikenella_corrodens

Enterobacter_cancerogenusEubacterium_sireaum

Fusobacterium_periodonticumGemella_haemolysansHoldemania_filiformis

Kingella_oralisMethanobrevibacter_smithii

Mitsuokella_multacidaNeisseria_elongata

Parabacteroides_johnsoniiParabacteroides_merdae

Prevotella_tanneraeProvidencia_alcalifaciens

Ruminococcus_gnavusSelenomonas_sputigenaShuttleworthia_satelles

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Pool 1Pool 2Pool 3Pool 4

300-fold difference in prevalence of 16S genes for separate organisms in the poolMethanobrevibacter (an archaea) and Collinsella do not amplify with 16S primers utilized.

INFECTIONSample

Culture single

species

Metagenomic analysis(culture-independent)

Strains/Subspecies based on SNP/indel

content

Strains/Subspecies based on gene content Species present

Variants of a species

WGS

AssemblyAnnotation

16S

Assembly, Annotation

WGS

Alignment

Strains/Subspecie

s

Replace culture-based analysis with

metagenomic analysis

Traditional culture-based analysis

Genes of interest

Clinical• Greg Storch, WU• Susan Haake, UCLA• Phil Tarr, WU• Martin Blaser, NYU • Barb Warner, WU• Richard Hotchkiss, WU• J. Dennis Fortenberry, Indiana U • Scott Weiss, Harvard• Ellen Li, SUNY-Stony Brook • Katherine Gregory, Harvard• Huiying Li, UCLA • Catherine O’Brien, Toronto• Brad Warner, WU• Homer Twigg, Indiana U

• Many others

Acknowledgments

Washington University Genome Institute:• Makedonka Mitreva• Erica Sodergren• Sahar Abubucker• Karthik Kota• John Martin• Bruce Rosa• Yanjiao Zhou• Kristine Wylie• Kathie Mihindukulasuriya• Hongyu Gao• Bill Shannon• Patricio La Rosa•Great Production & Informatics Teams

Funding: NIH Gates Foundation

Peer Bork Group• Siegfried Schloissnig• Manimozhiyan Arumugam• Shinichi Sunagawa• Julien Tap• Ana Zhu• Alison S. Waller• Daniel R. Mende• Shamil R. Sunyaev

Thank you to the subjects and their families

Documents

PacBio Meets the Microbiome