Upload
isaac-domenic-eaton
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Genomics in food security:100K Pathogen genome
Project
Bart Weimer, Ph.D.Professor
UC Davis - School of Veterinary MedicineDirector BGI@UCDavis
Changing world & food safety challenges
• World population predicted to reach 9.2 billion by 2050
• ~25 Mega-cites around world• Increased density• Increased distance from food
supply• Predicted increase in world-wide
food-related disease outbreaks
IndustrializedDeveloping
(modified from Z_punkt)
• Example food associated outbreaks:• 2013 – Pet food, turtles, hedgehogs,
tea, fruit, chicken, beef, pork, & Salmonella
• 2012 – Pet food, fruit, nuts, peanut butter, hamburger, tuna, chicken & Salmonella
• 2011 – Veggies and re-emergence of E. coli O104 (genomics resolved)
• 2010 – Eggs and S. Enteritidis • 2008 – Peanut butter-S. Typhimurium• 2006 – Tomatoes, peppers-
S. Typhimurium
Foodborne Pathogens
• Salmonella particularly persistent:
• High serotype diversity
• High mobile element diversity
• Frequent horizontal gene transfer
• Emerging stable hypervirulence • Heithoff et al.PLoS Pathog ‘12 8(4):e1002647
• Large genomic diversity within serotype
• E. coli O104 example of NGS & solutions
Norovirus61%
Salmonella (non-ty-phoid)12%
Clostridium perfringens
11%
Campy-lobacter
spp.9%
Staph aureus3%
Toxoplasma gondii1%
E. coli O157:H72%
Listeria monocytogenes1%
http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf
Total FBI 2011
Norovirus12%
Salmonella (non-ty-phoid)30%
Campylobacter spp.7%
Toxoplasma gondii26%
E. coli O157:H7
4%
Listeria monocy-togenes
21%
http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf
FBI deaths 2011
Non-typhoidal Salmonella
Moran et al., 2011. Gut 60:1412-1425
EPITHELIAL CELLS
bacterial adhesion and exposes other sugar residues to hydrolysis enables association with the cell membrane for invasion
Egberts et al., 1984. Veterinary Quarterly 6:4:186-199.
Salmonella Diversity
Salmonella species Number of serovars
S. enterica 2,557
S. enterica subsp. enterica 1,531
S. enterica subsp. salamae 505
S. enterica subsp. arizonae 99
S. enterica subsp. diarizonae 336
S. enterica subsp. houtenae 73
S. enterica subsp. indica 13
S. bongori 22
Total (genus Salmonella) 2,579
1,630 serotypes important in food animals
• Approximately 50-60 serotypes are most common to cause FBI in humans
• New foods becoming associated with new serotypes
Salmonella Phylogenomics
16s Alignment
SNPs in 16s rDNA
Whole Genome Alignment
Entire genome reflect all similarities
Pathogen Evolution
• Vibrio evolution rapid• Example for all enteric bacteria
• Also shown with environmental organisms
• Enterobacteria genome evolution• HGT more common than appreciated
• Genome rearrangements influenced by biogeography & other bacterial community members
• Evidence for local pressure to induce population genome evolution• Biogeography differences
• Likely to find footprints of geographical origin
• Requires large number of genomes to estimate
• Creates chimeric genomes
• Stress • Induces SNPs
• Induces new virulence and drug resistance
• Mutations in DNA repair genes leads to SNPs
• Recombination events• SNPs
• Large segments
• HGT
Shapiro et al., ‘12 Science; Denef & Banfield, ’12 Science
Genomics Paradigm to enrich biomarker
Specific gene(PCR)
Genome(NGS, multiplex)
Sequencing discovers new genes
Pan-genome increases with each isolate sequenced
Variable Salmonella genome
Micro. Ecol. 2011
New gene families
Core gene families
Increasingly complex genomic diversity
100K Pathogen genome project
Increase food safety using microbe systems biology
Discover the genetic constituents that are robust to be predictive biomarkers for specific traits
Rapid ID and tracking Understand evolution to build more
robust detection systems New isolate emergence and persistence Integration into current practiceshttp://100kgenome.vetmed.ucdavis.edu
2012 HHSInnovate Secretary’s Choice Awardee
integration of100K Project
• Produce a database of phylogenomic diversity of important FBI
• Industry representative genomes important
• Background organisms
M. Allard
100K Consortium • Founding Members, Executive
committee• Agilent Technologies
• UC Davis (Weimer lab)
• FDA
• Additional Steering Committee members• NIH (NCBI)
• CDC
• USDA
• Mars, Inc.
• Pacific Biosciences
• CLCbio
• Steering committee provides guidance for overall project direction and goals
• Affiliate Members• UC Davis - Food Science, PMI, PS, PHR
• California Veterinary Diagnostic Lab
• Salisbury University (USA)
• DoD - Walter Reed Hospital
• Mass General - Harvard hospital system
• RIVM (Netherlands)
• DTU (Denmark)
• MEFOSA (Lebanon)
• Sydney Technical University (Australia)
• Rajiv Gandhi Biotechnology Institute (India)
• Institute of Environmental Science & Research (NZ)
• Oak Ridge National Laboratory (ORNL)
• ANSIS (France)
• Additional negotiations in process with groups from Asia, Africa, Europe
• Corporate Affiliates• OpGen
• Kapa Biosystems
• PerkinElmer
• BGI@UCDavis
ADDITIONAL PARTICIPANTS WELCOME
100K project Web site
http://100kgenome.vetmed.ucdavis.edu
Organisms of Interest• Salmonella
• Listeria
• Campylobacter
• Vibrio
• E. coli
• Shigella
• Yersinia
• Clostridium
• Enterococcus
• Cronobacter
• Norovirus
• Hepatitis A
• Enteroviruses
• World-wide representation to capture genomic diversity to represent pan-genome for the most important organisms
Initial focus
Logistical challenges
• Large number of isolates to authenticate & bank
• DNA isolation from many types of bacteria (i.e. lysis)
• DNA quality on large scale
• Large number of draft genomes of variable quality
• Sequencing strategy to meet the needs of the user community• Government• Industry• Public health
• Informatics to derive actionable information quickly
• Biomarker discovery and implementation for daily use
Isolate bank
Authenticated & banked Isolates(~3,500 isolates)
Pending authentication & banking(~15,000 isolates)
Isolates by region
Submission Logistics• Isolate submission
• Isolate agreement• MTA• Timing and specific isolates
• International isolates• All permits in place• Timing negotiated
• Sequencing – • BGI@UCDavis – short reads• UC Davis and others – long reads• Return data to submitter via 100K bioproject page• 12 months for review• Release data at NCBI for public access
• Data return & analysis
• Publication & public database release
100K Sequencing Process
Bank cultureIsolate DNA
Library construction(300-600 libraries/day)
Make library Genome DB & feature ID
Short read technologies
Long read technologies
Whole Genome Mapping
Sequence library
SE SNP Analysis
Yearly SNP evolution
elephant seal, lung
sea lion, liver
elephant seal, kidney
equine feces
‘08, sea lion, liver
’98 otter, abdomen
’05 sea lion, uterus
1993
‘02 rodent feces’11 elephant seal urine ’11 elephant seal brain
Finished genomes & the analytical future
• Sea mammal outbreak (SNP) (Deng et al. 2013)
• Food outbreaks (SNP)
• Data analysis post sequencing• Comparison for SNPs• Gene content = annotation• Content comparison = forensics
• Gene• Protein• COGs/GO use in statistical enrichment
• Going beyond SNPs and beyond• Guided biomarker discovery• Genomics – limited based on sequences available
Active Outbreaks & Complete Genomes
• Salmonella enterica subsp. enterica serovar Javiana • Common in fresh cut
produce
• Only one previously sequenced genome (JCVI, 2008), 19 contigs
• Isolate CFSAN001992_73:• Clinical Arizona isolate from
produce-related 2012 outbreak
• Complete process from isolate to finished genomic sequence <1 week
• 1 chromosome; 2 plasmids containing never-seen sequence:
Collaboration with M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muravanda, S. Musser (FDA), B. Weimer (UC Davis), Jonas Kolach (PacBio)
Initial Salmonella Genomes - PacBio
Strain Sequencing Genome size Additional genomic
elements
S. Bareilly (SAL2881) 8 SMRT Cells 4,730,611 bp 78,193 bp
S. Heidelberg (318_04) 8 SMRT Cells 4,793,478 bp 117,929 bp; 35,296 bp;
3969 bp
S. Heidelberg (2069) 8 SMRT Cells 4,783,941 bp 110,345 bp; 37,704 bp
S. Typhimurium (2048) 8 SMRT Cells 4,967,892 bp 142,804 bp; 48,532 bp
S. Javiana (1992_73) 8 SMRT Cells 4,629,444 bp 24,013 bp; 17,094 bp
S. Cubana (2050) 12 SMRT Cells 4,977,480 bp 166,668 bp; 122,863 bp
S. St. Paul SP3 8 SMRT Cells 4,730,130 bp none
S. St. Paul SP48 8 SMRT Cells 4,940,224 bp 44,606 bp; 40,801 bpCollaboration with Pacific BioSciences, M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muruvanda, S. Musser (FDA), R. Roberts (NEB), B. Weimer (UC Davis)
Salmonella Strain Epigenomes
complete methylation
partial methylation
Bareilly(SAL2881)
Heidelberg (CFSAN000318_04)
Javiana (CFSAN001992_73)
Typhimurium (CFSAN001921_01)
St Paul(SP3)
5’-GATC-3’/3’-CTAG-5’
5’-CAGAG-3’/3’-GTCTC-5’
5’-ATGCAT-3’/3’-TACGTA-5’
5'-CAGCTG-3'/3'-GTCGAC-5'
5'-GATCAG-3'/3'-CTAGTC-5'
5’-ACCANCC-3’/3’-TGGTNGG-5’
5’-CCGAN5GTC-3’/3’-GGCTN5CAG-5’
5’-GAGN6RTAYG-3’/3’-CTCN6YATRC-5’
5’-GN2TAYN5RTGG-3’/3’-CN2ATRN5YACC-5’
5’-GpsAAC-3’/3’-CTTpsG-5’
Collaboration with M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muravanda, S. Musser (FDA), B. Weimer (UC Davis), Jonas Kolach (PacBio)
SP48 Mobile Element Analysis
• Using PHAST (http://phast.wishartlab.com)
• “Phage_Gifsy_2” is chromosomal (#3) and plasmid encoded (#6)
• Fels and Gifsy - virulence
SP48 Epigenome Determination
• Methyltransferase and Phosphorothioation (PT)-mediating systems specificities:
• 3 active methyltransferases and 1 active phosphorothioation system detected
• PT = unknown function
PT
Vibrio Mobile Element Analysis
• Using PHAST (http://phast.wishartlab.com)
• 2 putative methyltransferases encoded within Phage elements
• Contains incomplete phage elements associated with shiga toxin production (stx)
Listeria monocytogenes
PLoS ONE 8: e67511, June 25 2013
Listeria monocytogenes De Novo Assemblies
Den Bakker et al. (2013):• 4 kb plasmid library (Sanger)
• 10 kb plasmid library (Sanger)
• 40 kb fosmid library (Sanger)
• 454 library
• Illumina library
This study* - 10 kb PacBio library
Serotype StrainContig
sAssembly size
1/2a 10403S 1 2,903,1061/2a FSL N3-165 39 2.88x106
1/2b FSL N1-017 79 3.14x106
1/2b FSL R2-503 55 2.99x106
1/2c FSL R2-561 1 2,973,8013a Finland1998 1 2,874,431
4bAureli1997 (HPB2262)
79 2.99x106
4c FSL J2-071 53 2.85x106
Serotype StrainContig
sAssembly size
1/2a 861 1 2,988,9471/2a 878 1 2,981,8861/2a 899 1 2,958,9081/2a 1846 1 2,947,4741/2a 2074 1 2,897,1401/2a 2625 1 2,896,5871/2a 2626 1 2,907,0561/2a 2676 1 2,947,293
1/2b 859 2
3,034,043 (chr)
57,557 (plasmid)
1/2b 867 1 2,943,218
1/2b 911 2
3,094,342 (chr)
148,959 (plasmid)
1/2b 2624 1 2,932,4951/2b G4599 1 3,011,693
4b 1493 2
2,953,719 (chr)
55,804 (plasmid)
4b 1494 2
2,953,716 (chr)
55,804 (plasmid)
4b 1495 2
2,953,708 (chr)
55,803 (plasmid)
*Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)
Whole-Genome Organization Comparison
Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)
861
878
899
1846
2074
2625
2626
2676
859
867
911
2624
G4599
1493
1494
1495
4b1/
2b1/
2a
Mobile ElementsSerotype Strain intact incomplete questionable
1/2a 861 2 2 01/2a 878 0 1 11/2a 899 1 1 11/2a 1846 1 1 11/2a 2074 0 0 11/2a 2625 1 1 01/2a 2626 1 1 01/2a 2676 2 1 01/2b 859 1 3 11/2b 867 1 1 01/2b 911 2 3 11/2b 2624 0 1 01/2b G4599 1 1 1
4b 1493 0 1 14b 1494 0 1 14b 1495 0 1 1
L2625
L2
67
6
L2676
L2625
Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis) http://phast.wishartlab.com/
Serotype 1/2a 1/2b 4b
Methyltransferase Specificity Modified Base 861 878 899 1846
2074
2625
2626
2676 859 867 911 262
4G4599
1493
1494
1495
5'-GATC-3'3'-CTAG-5'
m6A 99.5 98.9
5’-GATC-3’3’-CTAG-5’
X 8.4 12.2 15.8
5'-GACN5GGT-3'3'-CTGN5CCA-5'
m6A 98.9 98.9
5'-GAN6TGCG-3'3'-CTN6ACGC-5'
m6A 99.7 99.6 100
99.8 99.8 99.9
5'-TACBN6GTNG-3'3'-ATGVN6CANC-5'
m6A 99.7 99.8
5'-TAGRAG-3'3'-ATCYTC-5'
m6A 99.3
5'-GTATCC-3'3'-CATAGG-5'
m6A 99.6 99.7
99.1 98.8
99.2 98.0
L. monocytogenes Epigenomes
same epidemic clone (ECVII)
Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)
Serotype 1/2a 1/2b 4b
Methyltransferase Specificity Modified Base 861 878 899 1846
2074
2625
2626
2676 859 867 911 262
4G4599
1493
1494
1495
5'-GATC-3'3'-CTAG-5'
m6A
5’-GATC-3’3’-CTAG-5’
X
5'-GACN5GGT-3'3'-CTGN5CCA-5'
m6A
5'-GAN6TGCG-3'3'-CTN6ACGC-5'
m6A
5'-TACN7GTNG-3'3'-ATGN7CANC-5'
m6A
5'-TAGRAG-3'3'-ATCYTC-5'
m6A
5'-GTATCC-3'3'-CATAGG-5'
m6A
L. monocytogenes Epigenomes
5’-GAxTC-3’ 5’-Gm6ATC-3’
Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)
100K Bioproject
http://www.ncbi.nlm.nih.gov/bioproject/186441
Project status
• Year 1:• Focus on the top 50 Salmonella outbreak serotypes• Banked ~3500 isolates• Developing world-wide partnerships• Automate sequence library construction• Sequence 1500 isolates• NCBI 100K Bioproject
• Year 2-5• Bank additional isolates• Automated, routine library construction• Sequence ~25K genomes/year• Finish 1000 genomes to a single closed genome• Generate epigenomic data• Define high resolution map assemblies for small set• Define need for additional bioinformatics
Initial public release of draft genomes May 2013
Second public release of 20 closed genome July 2013
Third public release of 1500 genomes…Fall 2013
Reduce time to resultTraditional - Enrichment
Collect sample
T0
T24 T36 T48 T60 T72+
Ship to labPre-enrich
Log sample PrepEnrich
PlateSelective enrichPresumptive
Examine plateConfirm Bank
Examine plateConfirm Bank
IDCharacterizeGenomics
Next generation – Culture independent
Collect sample
T0
T0.5 T1 T2 T3 T4+
Capture & concentrate
PresumptiveRelative amt.
Directed plateDirected enrichDNA prep
Multiplex PCRqPCRSequencing prep
Confirm PCRSequenceBank DNA
IDCharacterizeIn/out of event
Molecular salmonella testing in 2 hours
Lyse cells Add primers
PCR (45 mins)
• Colony• Enrichment broth• Lab medium• Capture/
concentration
Action based on: 1. Salmonella detection2. serotype determination
Report Salmonella, serogroup &
serotype
Nano electrophore
sis (30 mins)
Molecular detection validation
• Validation Approach• ~1750 isolates tested• Designed to detect ~30 most common serotypes• Multiple matrices
• Verification & Validation • Validated by 3 independent labs • 100% accurate in Salmonella identification with 6
independent blinded panels• 98% accurate for serotype determination
Workflow innovation
NGS ofentire genome
Library construction
NGSSequencing
Bioinformatics &statistics
1.5 hoursTotal time6 hours 48 hours
~24-130 hours*
Enrichment & colony isolation
Result
Nano Electrophoresis
Colony multiplex
PCR
Broth multiplex
PCR
Culture independentcapture/concentration
Presumptive ID withSolid phase ELISA
Nano Electrophoresis
Critical Needs • Robust biomarkers • Fast, actionable answers• Novel sequence analysis
strategies
AcknowledgementsWeimer Lab
• Dr. Yi Xie
• Dr. Richard Jeannotte
• Dr. Holly Ganz
• Dr. Marie Forquin
• Dr. Prerak Desai
• Dr. Jigna Shah
• Ms. Nugget Dao
• Ms. Mai Lee Yang
• Ms. Kao Thao
• Ms. Winnie Ng
• Ms. Carol Huang
Thanks to the sponsors:FDA/CFSANUSDADARPAUS Air ForceAgilent TechnologiesCA Dairy industryPacific BiosciencesMars, Inc.UCD Wildlife Health Center
cBio• Dr. Kumar Hari• Dr. Ravi Jane
UCD/CAHFS/SVM• Dr. Kris Clothier• Dr. Barb Byrne• Dr. Woutrina Miller• Dr. Linda Harris• Dr. Maria Marco
Agilent Technologies• Dr. Rudi Grimm• Dr. Lenore Kelly• Dr. Steffan Müeller• Dr. Steve Royce• Dr. Paul Zavitsanos
PacBio• Jonas Korlach• Luke Hickey
UC Santa Barbara• Dr. Mike Mahan
OpGen• Dr. Erin Newburn
CFSAN• Dr. Marc Allard• Dr. Eric Brown
Thank You…
Bart Weimer Professor, UC Davis
Director, BGI@UC Davis
Director, 100K Genome Project
Director, Integration Core, NIH-West Coast Metabolomics Center
530.754.0109
Questions?