Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Genomics in food security:
100K Pathogen genome
Project
Bart Weimer, Ph.D. Professor
UC Davis - School of Veterinary Medicine
Director BGI@UCDavis
The Agricultural System
Biosecurity & Safety
Agriculture
Environment
Health
Changing world &
food safety challenges • World population predicted to
reach 9.2 billion by 2050
• Increased urbanization
• Developing countries increase ~1.5 to 2.0
• ~25 Mega-cites around world
• Increased density
• Increased distance from food supply
• Predicted increase in world-wide food-related disease outbreaks
• Intensive agriculture needed to feed the world
Industrialized = Aging
Developing = Baby boom 0
2
4
6
8
10
1990 2000 2010 2020 2030 2040 2050
Population (billions)
(modified from Z_punkt)
0
2
4
6
8
10
1990 2000 2010 2020 2030 2040 2050
Food Safety & Quality
• Example outbreaks associated with food:
• 2013 – Pet food, turtles, hedgehogs, tea, chicken, beef, pork, & Salmonella
• 2012 – Pet food, fruit, nuts, peanut butter, hamburger, tuna, chicken &
Salmonella
• 2011 – Veggies and Re-emergence of E. coli O104 (genomics required to solve)
• 2010 – Eggs and S. Enteritidis
• 2008 – Peanut butter and S. Typhimurium
• 2006 – Tomatoes, peppers and S. Typhimurium
• Estimated economic impact in the EU >€5 billion from Campylobacter &
Salmonella
(EFSA, 2012)
Food Safety News Jan 2012
Norovirus
61%
Salmonella
(non-typhoid)
12%
Clostridium
perfringens
11%
Campylobacter
spp.
9%
Staph aureus
3%
Toxoplasma
gondii
1%
E. coli O157:H7 2%
Listeria
monocytogenes
1%
http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf
Total FBI 2011
Norovirus
12%
Salmonella
(non-typhoid)
30%
Clostridium
perfringens
0%
Campylobacter
spp.
7%
Staphylococcus
aureus
0%
Toxoplasma
gondii
26%
E. coli
O157:H7
4%
Listeria
monocytogenes
21%
http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf
FBI deaths 2011
Foodborne Pathogens
• Salmonella particularly devastating:
• High serotype diversity
• High mobile element diversity
• Frequent horizontal gene transfer
• Emerging stable hypervirulence
• Heithoff et al.PLoS Pathog 8(4):e1002647
• Large genomic diversity within serotype
• E. coli O104 example of NGS & solutions
Salmonella Diversity
Salmonella species Number of
serovars
S. enterica 2,557
S. enterica subsp. enterica 1,531
S. enterica subsp. salamae 505
S. enterica subsp. arizonae 99
S. enterica subsp. diarizonae 336
S. enterica subsp. houtenae 73
S. enterica subsp. indica 13
S. bongori 22
Total (genus Salmonella) 2,579
1,630 serotypes
important in food animals
• Approximately 50-60
serotypes are most
common to cause FBI
in humans
• New foods becoming
associated with new
serotypes
Dynamic Microbial
Communities & Disease
FoodNet
Javiana
Newport
Typhimurium
Heidelberg
Enteritidis
Montevideo
Rela
tive r
ate
(L
og c
hange)
As one serotype declines others increase
Salmonella
Phylogenomics
16s Alignment
SNPs in 16s rDNA
Whole Genome Alignment
Entire genome reflect
all similarities
Active Outbreaks &
Complete Genomes • Salmonella enterica subsp.
enterica serovar Javiana
• Common in fresh cut
produce
• Only one previously
sequenced genome (JCVI,
2008), 19 contigs
• Isolate CFSAN001992_73:
• Clinical Arizona isolate from
produce-related 2012 outbreak
• Complete process from isolate to
finished genomic sequence <1
week
• 1 chromosome; 2 plasmids
containing never-seen sequence:
Collaboration with M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muravanda,
S. Musser (FDA), B. Weimer (UC Davis), Jonas Kolach (PacBio)
Sequencing
discovers new genes
Pan-genome increases with each
isolate sequenced
Variable Salmonella genome
Micro. Ecol. 2011
New gene families
Core gene families
• We know <1% of the Earth’s microbiome
• Horizontal gene transfer is wide-spread and frequent
• High-quality, finished genomes are the starting point for:
• Functional genomic studies
• Comparative genomics
• Forensics
• Metagenomics
The Value of
Microbial Genomes
Chain et al. (2009) Science 326: 236-237 Fraser et al. (2002) J Bacteriology 184: 6403-6405
Pathogen Evolution
• Vibrio evolution rapid
• Example for all enteric bacteria
• Also shown with environmental organisms
• Enterobacteria genome evolution
• HGT more common than appreciated
• Genome rearrangements influenced by biogeography & other bacterial community members
• Evidence for local pressure to induce population genome evolution
• Biogeography differences
• Likely to find footprints of geographical origin
• Requires large number of genomes to estimate
• Creates chimeric genomes
• Stress
• Induces SNPs
• Induces new virulence and drug resistance
• Mutations in DNA repair genes leads to SNPs
• Recombination events
• SNPs
• Large segments
• HGT
Shapiro et al., ‘12 Science; Denef & Banfield, ’12 Science
New Detection Paradigm
Specific gene
(PCR) Genome
(NGS, multiplex)
Microbial Communities
& Health
Community structure
Host association
Growth & metabolism
Who’s there?
What are they
doing?
Host changes Microbe changes
Kawamoto et al., ‘12 Science
Olszak et al., ‘12 Science
Desai & Weimer, ‘09
Bacteria Testing in
Food
Collect Sample
(25 or 325 g)
Detection
method
Pre-enrich
sample in 4 L
of broth
Protein
Immune
inhibitors
Fat
Non-
culturable
bacteria
Fermentation
product
inhibitors
Project aims to
combine 2 of 3 steps
Faster response: < 30 h
More informative results
Detection Genus, species
qPCR, etc. 8-30 h
Traditional: 3-5 days
Serotyping Serogroup, Serovar
Serology: 4-5 days
Strain Typing Genovar, Genome
PFGE, MLST, genotype
5-10 days
Aim: reduce time to result with low cost, sensitive, accuracy
Too little genomic information = difficult molecular methods
Sensitivity Specificity
Surveillance
Outbreak response
Outbreak investigation
Detection timing
Reduce time to result
Traditional - Enrichment
Collect
sample T0
T24 T36 T48 T60 T72+
Ship to lab
Pre-enrich
Log sample
Prep
Enrich
Plate
Selective enrich
Presumptive
Examine plate
Confirm
Bank
Examine plate
Confirm
Bank
ID
Characterize
Genomics
Next generation –
Culture independent
Collect
sample T0
T0.5 T1 T2 T3 T4+
Capture &
concentrate
Presumptive
Relative amt.
Directed plate
Directed enrich
DNA prep
Multiplex PCR
qPCR
Sequencing prep
Confirm PCR
Sequence
Bank DNA
ID
Characterize
In/out of event
NGS Costs are Falling
(DeWitt et al., 2011)
Genomics & Rapid
detection • German outbreak
• Chimeric genome – deleted genes = False negative
• Eae deletion – main PCR marker missing = False negative
• FSMA
• International reach with new regulations
• Open to investigate alternate detection methods - CID
• International trade now firmly in the FDA plan
• Public health effort
• Using whole genome sequencing (WGS) to investigate outbreaks
• Installing Illumina sequencers in field offices
• More robust biomarkers are needed for routine testing
2013 Poultry Sci 92:562–572
100K Pathogen
genome project
Increase food safety using microbe systems biology
Discover the genetic constituents that are robust to be predictive biomarkers for specific traits
Rapid ID and tracking
Understand evolution to build more robust detection systems
New isolate emergence and persistence
Integration into current practices http://100kgenome.vetmed.ucdavis.edu
2012 HHSInnovate Secretary’s Choice Awardee
integration of
100K Project • Produce a
database of
phylogenomic
diversity of
important FBI
• Industry
representative
genomes
important
• Background
organisms
M. Allard
100K Consortium
• Founding Members, Executive committee
• Agilent Technologies
• UC Davis (Weimer lab)
• FDA
• Additional Steering Committee members
• NIH (NCBI)
• CDC
• USDA
• Mars, Inc.
• Pacific Biosciences
• Steering committee provides guidance for overall project direction and goals
• Affiliate Members
• UC Davis Food Science; Veterinary Diagnostic Lab
• Salisbury University (US)
• DoD - Walter Reed Hospital
• Mass General - Harvard hospital system
• RIVM (Netherlands)
• DTU (Denmark)
• MEFOSA (Lebanon)
• Sydney Technical University (Australia)
• Rajiv Gandhi Biotechnology Institute (India)
• Institute of Environmental Science & Research (NZ)
• Oak Ridge National Laboratory (ORNL)
• Additional negotiations in process with groups from Asia, Africa, Europe
• Corporate Affiliates
• Pacific Biosystems
• cBio
• OpGen
• Kapa Biosystems
• BGI@UCDavis
ADDITIONAL
PARTICIPANTS WELCOME
Process & outcome
Release
genomes to
public
Common
traits
Diagnostic
features Ecology
features
Genome
evolution
Infection
features
Traceability
features
Sequence genome
Collect
metadata
Analyze
genome for
actionable
features
Validate &
verify
information
Validate &
verify
sequence
Merge
genome
sequence &
metadata
Collect isolate
Submission Logistics
• Affiliates
• Many options
• In-kind
• Funding
• Isolates
• Analysis
• Data hosting
• Various levels of commitment available
• Groups providing funding & linking sequencing to important isolates
• Isolate submission
• Isolate agreement
• MTA
• Timing and specific isolates
• Submit isolates & metadata
• Authentication
• Bank isolates for DNA isolation & library construction
• Sequence –
• BGI@UCDavis
• Return data to submitter
• 12 months for review
• Deposit in NCBI for public access
• Data return & analysis
• Publication
Organisms of Interest
• Salmonella
• Listeria
• Campylobacter
• Vibrio
• E. coli
• Shigella
• Yersinia
• Clostridium
• Enterococcus
• Cronobacter
• Norovirus
• Hepatitis A
• Enteroviruses
• Short reads sequence followed by long read technologies for sub-set of isolates to complete genome
• Optical mapping will be used for a selected set to ensure genome quality
• Long read technology will be used to close 1,000 genomes
• Capture genomic diversity to represent pan-genome for the most important organisms
• World-wide representation
Initial focus
Isolate bank
Bacillus,3Brenneria,1 Campylobacter,100 Citrobacter,1
Cronobacter,3
Erwinia,1
Escherichia,203
Enterobacter,4
Klebsiella,2
Listeria,89
Moraxella,1
Proteus,1
Rummeliicillus,1
Salmonella,1350
Shigella,4
Streptococcus,1
Vibrio,149
Bacillus,3Brenneria,1 Campylobacter,100 Citrobacter,1
Cronobacter,3
Erwinia,1
Escherichia,203
Enterobacter,4
Klebsiella,2
Listeria,89
Moraxella,1
Proteus,1
Rummeliicillus,1
Salmonella,1350
Shigella,4
Streptococcus,1
Vibrio,149
Bacillus,8,0%
Brenneria,1,0%
Campylobacter,110,5%
Carnobacterium,1,0%
Citrobacter,1,0%
Cronobacter,3,0%
Erwinia,2,0%
Escherichia,203,9%
Enterococcus,210,9%
Enterobacter,4,0%
Exiguobacterium,1,0%
Klebsiella,2,0%
Lactococcus,1,0%Listeria,89,4%
Leuconostoc,3,0%
Moraxella,1,0%
Proteus,1,0%
Rummeliicillus,1,0%
Salmonella,1342,59%
Shigella,4,0%
Staphylococcus,1,0%
Streptococcus,2,0%
Vibrio,287,13%
Weisella,1,0%
Bacillus
Brenneria
Campylobacter
Carnobacterium
Citrobacter
Cronobacter
Erwinia
Escherichia
Enterococcus
Enterobacter
Exiguobacterium
Klebsiella
Lactococcus
Listeria
Leuconostoc
Moraxella
Proteus
Rummeliicillus
Salmonella
Shigella
Staphylococcus
Streptococcus
Vibrio
Authenticated & banked Isolates
(~3,500 isolates)
Bacillus,8,0%Brenneria,1,0%
Campylobacter,3295,55%
Carnobacterium,1,0%
Citrobacter,6,0%
Cronobacter,3,0%
Erwinia,2,0%Escherichia,203,3%Enterococcus,210,3%
Enterobacter,5,0%
Exiguobacterium,1,0%
Klebsiella,4,0%
Lactococcus,1,0%
Listeria,281,5%
Leuconostoc,3,0%
Moraxella,1,0%
Proteus,2,0%
Rummeliicillus,1,0%
Salmonella,1674,28%
Shigella,4,0%
Staphylococcus,1,0%
Streptococcus,2,0%
Vibrio,305,5%
Weisella,1,0%
Bacillus
Brenneria
Campylobacter
Carnobacterium
Citrobacter
Cronobacter
Erwinia
Escherichia
Enterococcus
Enterobacter
Exiguobacterium
Klebsiella
Lactococcus
Listeria
Leuconostoc
Moraxella
Proteus
Rummeliicillus
Salmonella
Shigella
Pending authentication & banking
(~15,000 isolates)
Isolates by region
NorthAmerica74%
SouthAmerica
1%
Europe7%
Asia4%
Australia0%
Africa3%
MiddleEast1%
Unknown10%
NumberofIsolatesbyRegion
NorthAmerica
SouthAmerica
Europe
Asia
Australia
Africa
MiddleEast
Unknown
100K Sequencing
Process Bank culture
Isolate DNA
Library automation
Make library
Short read technologies
Long read technologies
Optical mapping
Sequence library Genome DB
Project progress
• Year 1:
• Focus on the top 50 Salmonella outbreak serotypes
• Banked ~3500 isolates
• Developing world-wide partnerships
• Automation of sequence library construction
• Sequence 1800 isolates
• Year 2-5
• Bank additional isolates
• Automated, routine library construction
• Sequence ~25K genomes/year
• Finish 1000 genomes to a single closed genome
• Generate epigenomic data
• Define high resolution map assemblies for small set
• Define need for additional bioinformatics
100K project Web site
http://100kgenome.vetmed.ucdavis.edu
Public health
Outcomes • Federal agencies embracing NGS for outbreak investigation, trace
backs, and monitoring
• PFGE vs NGS
• Implementation is not a simple path
• Pan-genome value
• Discover new sets of genes that are >50% of the genome that we ignore today
• New robust testing methods that allows routine testing in plant
• Human and animal public health
• 100K database enables a new era of diagnostics tools
• Definition of virulence, antibiotic resistance, source, insights for mitigation, and window into emerging strain differences as a sentinels
• Host adaptation, zoonotic movement, supply chain changes
Innovation in
Methods
• Culture independent methods (CIM) to capture & concentrate
• Enrich
• Detect
• ID
• Coupling genomics & biomarkers with existing methods and CIM
• Increase speed
• Increase information diversity
• Serotypes
• Pathogens
Microbial
Isolation & detection • Classical approach –
grow what we can…
• BAM/ISO & growth
• Enrichment and ELISA
• Limited by those we know how to grow
• We know how to grow ~1% of bacteria
• Non-culturable bacteria common
• Viruses often don’t grow in vitro – limits detection approaches
• Strain diversity and continued outbreaks creating demand for new, next generation methods
• Rapid methods – looking for bacterial signatures…
• Finding new organisms without growth
• Enables customized approaches for screening
• Molecular methods
• Provides bacterial community structure information
• Individual genome sequencing
• Provides bacterial metabolism capability
• Can be linked to food characteristics
Examples of
molecular tools • Data analysis post sequencing
• Comparison for SNPs
• Gene content = annotation
• Content comparison = forensics
• Gene
• Protein
• COGs/GO use in statistical enrichment
• Sea mammal outbreak (SNP)
• Food outbreak (SNP)
• Going beyond SNPs and beyond
• Biomarker hunting
• Genomics – limited based on sequences available
Rapid Detection
Technologies to enhance food safety & security
Eliminate enrichment
Fast
Sensitive
Reliable
Food, water & environment
E. coli (B. Findley)
Existing Limitations
• Molecular assays are limited by too few genomes to
develop robust biomarker genes quickly
• Genome evolution more complex that previously
appreciated
• Lack of information to create robust assays for improved
detection using PCR
• Genome sequencing will enable technological advances
and increase reliability of PCR assays
Rapid detection
Capture
with fluidized bed
(15 mL to 40 L)
Presumptive ID (ELISA, PCR)
0 min
30–35 min
5 min
Detailed DNA/RNA analysis
Confirm ID, molecular serotype,
sequence, toxin production,
community analysis
40 min
120-135 min
& beyond…
microbe
bead
Blake & Weimer, ‘97 AEM
Weimer et al., ’00 JBBM
Weimer et al., ’01 AEM
Walsh et al., ’01 JBBM
Desai et al., ‘08 AEM
Maga et al., ‘12 AEM
Detection in Food &
Enrichment Broth
Organism
ImmunoFlow Commercial
Immunoppt. Food
E. coli
O157
Apple juice 0 0
Hamburger 80 40 30
Beer 0 0
Sprouts 100 100
Halibut 40 30 40
Chicken 60 0 20
In Food Enrich Broth
LM
SE
% positive
GlycoBind Detection
• Replaces Ab–based tests
• Capture ligands are cell
receptors
• Broad organism capture
surface
• Used in static & flow
setting
• Detection with RT-PCR
• WGS has arrived...
Myxobolus cerebralis
Food Type
Aerobic Plate Count before
E. coli O157:H7 spike
(cfu/gm)
Lower
Detection Limit
(cfu/gm)
Water 0 4
Apple juice 8 4
Spinach 577 4
Hamburger 24856 400
Salami 9350000 40000
Milk 0 40000
1
No enrichment
3 hour total time
4 cell sensitivity
<5% variation
Desai et al., 2008.
AEM 74:2254-2258
Molecular salmonella
testing in 2 hours
Report
Salmonella,
serogroup &
serotype
Nano
electrophoresis
(30 mins)
Lyse cells Add template PCR
(45 mins)
• Colony
• Enrichment broth
• Lab medium
• Bead capture
Action based on
1. Salmonella detection
2. serotype determination
Molecular
detection validation • Validation Approach
• ~1750 isolates tested
• Designed to detect ~40 most common serotypes
• Multiple matrices done
• Verification & Validation
• Used by 3 independent labs
• 100% accurate in 6 independent blinded panels for
Salmonella identification
• 98% accurate in correct serotype determination
Complex implementation
• Total DNA/RNA extraction from food
• Possible and being done for PCR and qRT-PCR
• Metagenomics – HARD & UNCLEAR reliability
• Rapid detection strategies
• Genomics and systems biology
• Reduction in time to result
• Robust & accurate genomic tests
• Requires access to genomes/genotypes
• The longest step is enrichment
• Eliminate pre-enrichment step for direct detection?
• PulseNet & genomics
• NGS is fast and direct
• Data rich
• Work flow remains unclear for best implementation
Workflow innovation
NGS of
entire genome
Library
construction
NGS
Sequencing
Bioinformatics &
statistics
1.5 hours Total time 6 hours 48 hours
~24-130 hours*
Enrichment
& colony isolation
Result
Nano Electrophoresis
Colony
multiplex
PCR
Broth
multiplex
PCR
Culture independent
capture/concentration
Presumptive ID
with
Solid phase ELISA
Nano
Electrophoresis
Critical Needs
• Robust biomarkers
• Fast, actionable answers
• Novel sequencing
strategies
SE SNP Analysis
Multiple SE snp-types
‘93 sea lion & otter
in 3 tissues
‘93, ’02, ‘08, ’11,
rodent, equine, sea
mammal
‘98, ‘05, ’08, sea
mammal
Yearly SNP evolution
elephant seal, lung
sea lion, liver
elephant seal, kidney
equine feces
‘08, sea lion, liver
’98 otter, abdomen
’05 sea lion, uterus
1993
‘02 rodent feces ’11 elephant seal urine ’11 elephant seal brain
Open questions
• Can individual food components modify the microbiome to enrich or exclude zoonotic pathogens?
• Genomic variation & outbreaks
• Host adaptation
• Transmission
• Virulence
• Antibiotic resistance
• Robust biomarkers & detection reliability
• Serotype and genotype
• Detection methods
• BAM –
• Sample prep –
• capture concentration coupled to trusted methods (ELISA)
• Molecular assays – PCR, robust biomarkers, mass spec
• Genomics – sequencing, metagenomics, directed sequencing
• Routine use vs outbreak investigation vs traceback
Acknowledgements
Weimer Lab
• Dr. Yi Xie
• Dr. Richard Jeannotte
• Dr. Holly Ganz
• Dr. Marie Forquin
• Dr. Prerak Desai
• Dr. Jigna Shah
• Ms. Nugget Dao
• Ms. Mai Lee Yang
• Ms. Kao Thao
• Ms. Winnie Ng
Thanks to the sponsors:
FDA
USDA
DARPA
US Air Force
Agilent Technologies
CA Dairy industry
Pacific BioSciences
Mars, Inc.
UCD Wildlife Health Center
cBio
• Dr. Kumar Hari
• Dr. Ravi Jane
UCD/CAHFS/SVM
• Dr. Kris Clothier
• Dr. Barb Byrne
• Dr. Woutrina Miller
• Dr. Linda Harris
• Dr. Maria Marco
Agilent Technologies
• Dr. Rudi Grimm
• Dr. Lenore Kelly
• Dr. Steffan Müeller
• Dr. Steve Royce
• Dr. Paul Zavitsanos
PacBio
• Jonas Korlach
• Luke Hickey
Thank You…
Bart Weimer Professor, UC Davis
Director, BGI@UC Davis
Director, 100K Genome Project
Director, Integration Core, NIH-West Coast Metabolomics Center
530.754.0109
Questions?