Upload
samson
View
36
Download
0
Tags:
Embed Size (px)
DESCRIPTION
PacBio Meets the Microbiome. George Weinstock PacBio Users Group Meeting September 18, 2013. Diverse interest in medical metagenomics. Acne Antibiotics , gut microbiome, and obesity Antibiotic resistance Asthma , allergies Acute RSV infection Vitamin D Bacterial vaginosis - PowerPoint PPT Presentation
Citation preview
PacBio Meets the Microbiome
George WeinstockPacBio Users Group MeetingSeptember 18, 2013
Diverse interest in medical metagenomics• Acne • Antibiotics, gut microbiome, and obesity• Antibiotic resistance• Asthma, allergies
• Acute RSV infection• Vitamin D
• Bacterial vaginosis• Cancer microbiomes• Conjunctiva – trachoma microbiome• Crohn's disease• Cystic fibrosis• Diabetes
• Oral microbiome• Skin microbiome
• Dietary effects on gut microbiome• Fecal transplant• HIV and lung microbiome• Infection control
• C. difficile• VRE• MRSA• E. coli O157 H7• NICU bacteremia
• Intestinal fat uptake
• Necrotizing enterocolitis• Non-Alcoholic Fatty Liver Disease• Oral microbiome
• Periodontitis• Caries
• Parasitic infection and the microbiome• Post-transplant Lymphoproliferative
Disorders • Pre-term birth
• Maternal microbiome• Vitamin D
• Respiratory microbiome• Influenza infection• Pre-term babies• Childhood vaccination
• Sepsis• ICU• NICU
• Short-bowel syndromes• Urethritis• Virus discovery
• Kawasaki Disease• Fever of unknown origin in children • Transplantation: CMV, BK• Immuno-suppression/-compromised
Approaches to study the microbiome
Microbial Communit
yBacteria Viruses
Eukaryotes
Targeted Sequencing16S rRNA
Shotgun Sequencing
Bacterial censusTaxa & Abundances
All microbesTaxa & Genes
Describe communities in many samples“Average” community and variations
BacteriaVirusesFungiYeastsProtistsEnzymes
Major “enterotypes” of the stool biomes
womenmen
St. LouisHouston
BMI>=30
BMI<25NA
not hispanic/latino/spanishHispanic/latino/spanish
25 <= BMI <30
Studying communities - 16S rRNA genes
Each row a different sample
Histograms of genera in each sample
Bacteroides
Prevotella
Ruminococcus
Some Metagenomic Effects
Community Structure
e.g. content;ecological
parameters (biodiversity)
Specific Organis
me.g. C. difficile
Multiple Specific Organis
msbeneficial ↓
detrimental↑
Genes or Pathway
se.g. lactic acid
Community, organism, or ensemble properties
Metagenomic pathogen detection in clinical samples
Patient samplesHospital
microbiology lab
Metagenomic
sequencing
Compare results
Alexis ElwardDavid HaslamGreg StorchRana ElfeghalyYanjiao ZhouKristine Wylie
Patients with/without hospital acquired diarrhea
A. C.diff+high TcdBB. NC (SE meds)C. C.diff +high TcdBD. NC IBDE. C.diff + low TcdBF. NC CampyG. C.diff +low TcdBH. NCI. NC SalmonellaJ. C.diff +average TcdB
Diagnostic lab results
16S analysis of clinical samples for C. difficile
NC: various negative controls
Subject Clinical findings C. difficile Campylobacter Salmonella
A C dif + high TcdB + Noro II CT 24 7.15 0.64 0.00
B NC (SE meds?) 0.02 0.00 0.00
C C dif + high TcdBc+ Sapo CT 35 45.40 0.00 0.00
D NC IBD 0.00 0.00 0.01
E C dif +low TcdB 0.90 0.00 0.00
F NC Campy + Sapo CT 34 0.00 6.24 0.00
G C dif +low TcdB 0.10 0.00 0.01
H NC ? 0.00 0.05 0.00
I NC Salmonella 0.05 0.00 5.02
J C dif + average TcdB 2.12 0.02 2.58
Pathogen relative abundance in clinical samples16S read abundance
The bacterial 16S rRNA gene (ssu)
Evaluation of 16S rDNA-based community profiling for human microbiome research.Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.
Trends in 16S rRNA gene sequencing
Full-length Sanger sequencingPCR => clone => sequenceAll 9 hypervariable regions
1/3-length 454 sequencingPCR 500bp regions => sequence
2-4 hypervariable regions
1/10-length Illumina sequencingPCR 500bp regions => sequence
1 hypervariable region
ExpensiveTime-consumingAccurate taxa ID
InexpensiveHigh-throughputLess accurate taxa ID
Very cheapVery high-throughputLess accurate taxa ID
Full-length PBPCR => sequence9 hypervariable
regions
Full-length 16S CCS sequencing on single organisms
Organism Length (reference or cluster)
% identity
Enterococcus faecalis
1543 99.6
Staphylococcus aureus
1537 99.9
Escherichia coli 1528 99.7
Rhodobacter sphaeroides
1456 99.9
Large-scale single isolate typing
• Have ~8000 isolates (microtiter plates) from hospital• Looking for unsequenced species from humans
• Need FL 16S in order to make a species call for typing• 400 base reads from 454 do not give enough specificity
• Each well has one strain • Sanger seq’ing of FL PCR products => single sequence w/o
cloning
• Can PacBio compete: cheaper, higher throughput?
• Goal: • Find what species these isolates are• Choose novel isolates• Perform WG sequencing
Large-scale single isolate typing with PacBio
• Sanger: do not see alleles of multiple 16S genes/strainPacBio: can see different alleles since single molecule
• Hospital isolates (82):• 70 samples agree between Sanger and PacBio• 4 samples have minor species seen with both platforms• 5 samples have strain differences seen with PacBio, not w
Sanger• 2 samples failed with Sanger, not w PacBio• 1 sample disagreement• 7 DNA sample controls agree between platforms• 4 known culture sample controls agree between platforms
• PacBio: can see low level contaminants• 99% agreement between Sanger and PacBio (90/91)
• Only 1 disagreement between the platforms• More information from PacBio
Cost is an issue
• With 96 samples/1 SMRT cell, the fully loaded cost of PacBio is about 2x Sanger.• SMRT cell• Sequencing reagents• Library kit and labor• Instrument• Computation (storage, labor, cpu)
• Would need to pool more samples/SMRT cell• Need more bar codes
Sequencing communities of microbes en masse
• 16S rRNA gene sequencing for community profiling• Full-length gives species-level definition• 454 500bp reads give genus-level definition
• Shotgun sequencing• Longer reads give better assembly (of unknown uncultured)• Bacteria, viruses, fungi and other eukaryotes described
Simulated community 16S sequencing
• A mock community of 24 species • Only 22 amplified with the primers used • Organisms range over 300-fold in abundance• Make 4 different batches
• Aim for 5000 sequences/sample (454 protocol)
Pool 1 Pool 2 Pool 3 Pool 4
Reads after filtering
3557 5055 10331 9798
Species found
20 21 22 22
% reads hitting species
99.9 99.9 92.0 90.8
Mock community analysis with Sanger, 454
Evaluation of 16S rDNA-based community profiling for human microbiome research.Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.
Consistent recognition of an organism in the pool for 4 replicate 16S amplifications:
Actinomyces_odontolyticusBacteroides_plebeius
Bifidobacterium_dentiumCitrobacter_youngae
Clostridium_nexileCollinsella_stercoris
Dialister_invisusEikenella_corrodens
Enterobacter_cancerogenusEubacterium_sireaum
Fusobacterium_periodonticumGemella_haemolysansHoldemania_filiformis
Kingella_oralisMethanobrevibacter_smithii
Mitsuokella_multacidaNeisseria_elongata
Parabacteroides_johnsoniiParabacteroides_merdae
Prevotella_tanneraeProvidencia_alcalifaciens
Ruminococcus_gnavusSelenomonas_sputigenaShuttleworthia_satelles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Pool 1Pool 2Pool 3Pool 4
300-fold difference in prevalence of 16S genes for separate organisms in the poolMethanobrevibacter (an archaea) and Collinsella do not amplify with 16S primers utilized.
INFECTIONSample
Culture single
species
Metagenomic analysis(culture-independent)
Strains/Subspecies based on SNP/indel
content
Strains/Subspecies based on gene content Species present
Variants of a species
WGS
AssemblyAnnotation
16S
Assembly, Annotation
WGS
Alignment
Strains/Subspecie
s
Replace culture-based analysis with
metagenomic analysis
Traditional culture-based analysis
Genes of interest
Clinical• Greg Storch, WU• Susan Haake, UCLA• Phil Tarr, WU• Martin Blaser, NYU • Barb Warner, WU• Richard Hotchkiss, WU• J. Dennis Fortenberry, Indiana U • Scott Weiss, Harvard• Ellen Li, SUNY-Stony Brook • Katherine Gregory, Harvard• Huiying Li, UCLA • Catherine O’Brien, Toronto• Brad Warner, WU• Homer Twigg, Indiana U
• Many others
Acknowledgments
Washington University Genome Institute:• Makedonka Mitreva• Erica Sodergren• Sahar Abubucker• Karthik Kota• John Martin• Bruce Rosa• Yanjiao Zhou• Kristine Wylie• Kathie Mihindukulasuriya• Hongyu Gao• Bill Shannon• Patricio La Rosa•Great Production & Informatics Teams
Funding: NIH Gates Foundation
Peer Bork Group• Siegfried Schloissnig• Manimozhiyan Arumugam• Shinichi Sunagawa• Julien Tap• Ana Zhu• Alison S. Waller• Daniel R. Mende• Shamil R. Sunyaev
Thank you to the subjects and their families