Genomics in food security: 100K Pathogen genome Project Bart Weimer, Ph.D. Professor UC Davis - School of Veterinary Medicine Director BGI@UCDavis

Genomics in food security:100K Pathogen genome

Project

Bart Weimer, Ph.D.Professor

UC Davis - School of Veterinary MedicineDirector BGI@UCDavis

Changing world & food safety challenges

• World population predicted to reach 9.2 billion by 2050

• ~25 Mega-cites around world• Increased density• Increased distance from food

supply• Predicted increase in world-wide

food-related disease outbreaks

IndustrializedDeveloping

(modified from Z_punkt)

• Example food associated outbreaks:• 2013 – Pet food, turtles, hedgehogs,

tea, fruit, chicken, beef, pork, & Salmonella

• 2012 – Pet food, fruit, nuts, peanut butter, hamburger, tuna, chicken & Salmonella

• 2011 – Veggies and re-emergence of E. coli O104 (genomics resolved)

• 2010 – Eggs and S. Enteritidis • 2008 – Peanut butter-S. Typhimurium• 2006 – Tomatoes, peppers-

S. Typhimurium

Foodborne Pathogens

• Salmonella particularly persistent:

• High serotype diversity

• High mobile element diversity

• Frequent horizontal gene transfer

• Emerging stable hypervirulence • Heithoff et al.PLoS Pathog ‘12 8(4):e1002647

• Large genomic diversity within serotype

• E. coli O104 example of NGS & solutions

Norovirus61%

Salmonella (non-ty-phoid)12%

Clostridium perfringens

11%

Campy-lobacter

spp.9%

Staph aureus3%

Toxoplasma gondii1%

E. coli O157:H72%

Listeria monocytogenes1%

http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf

Total FBI 2011

Norovirus12%

Salmonella (non-ty-phoid)30%

Campylobacter spp.7%

Toxoplasma gondii26%

E. coli O157:H7

4%

Listeria monocy-togenes

21%

http://www.cdc.gov/foodborneburden/PDFs/FACTSHEET_A_FINDINGS_updated4-13.pdf

FBI deaths 2011

Non-typhoidal Salmonella

Moran et al., 2011. Gut 60:1412-1425

EPITHELIAL CELLS

bacterial adhesion and exposes other sugar residues to hydrolysis enables association with the cell membrane for invasion

Egberts et al., 1984. Veterinary Quarterly 6:4:186-199.

Salmonella Diversity

Salmonella species Number of serovars

S. enterica 2,557

S. enterica subsp. enterica 1,531

S. enterica subsp. salamae 505

S. enterica subsp. arizonae 99

S. enterica subsp. diarizonae 336

S. enterica subsp. houtenae 73

S. enterica subsp. indica 13

S. bongori 22

Total (genus Salmonella) 2,579

1,630 serotypes important in food animals

• Approximately 50-60 serotypes are most common to cause FBI in humans

• New foods becoming associated with new serotypes

Salmonella Phylogenomics

16s Alignment

SNPs in 16s rDNA

Whole Genome Alignment

Entire genome reflect all similarities

Pathogen Evolution

• Vibrio evolution rapid• Example for all enteric bacteria

• Also shown with environmental organisms

• Enterobacteria genome evolution• HGT more common than appreciated

• Genome rearrangements influenced by biogeography & other bacterial community members

• Evidence for local pressure to induce population genome evolution• Biogeography differences

• Likely to find footprints of geographical origin

• Requires large number of genomes to estimate

• Creates chimeric genomes

• Stress • Induces SNPs

• Induces new virulence and drug resistance

• Mutations in DNA repair genes leads to SNPs

• Recombination events• SNPs

• Large segments

• HGT

Shapiro et al., ‘12 Science; Denef & Banfield, ’12 Science

Genomics Paradigm to enrich biomarker

Specific gene(PCR)

Genome(NGS, multiplex)

Sequencing discovers new genes

Pan-genome increases with each isolate sequenced

Variable Salmonella genome

Micro. Ecol. 2011

New gene families

Core gene families

Increasingly complex genomic diversity

100K Pathogen genome project

Increase food safety using microbe systems biology

Discover the genetic constituents that are robust to be predictive biomarkers for specific traits

Rapid ID and tracking Understand evolution to build more

robust detection systems New isolate emergence and persistence Integration into current practiceshttp://100kgenome.vetmed.ucdavis.edu

2012 HHSInnovate Secretary’s Choice Awardee

integration of100K Project

• Produce a database of phylogenomic diversity of important FBI

• Industry representative genomes important

• Background organisms

M. Allard

100K Consortium • Founding Members, Executive

committee• Agilent Technologies

• UC Davis (Weimer lab)

• FDA

• Additional Steering Committee members• NIH (NCBI)

• CDC

• USDA

• Mars, Inc.

• Pacific Biosciences

• CLCbio

• Steering committee provides guidance for overall project direction and goals

• Affiliate Members• UC Davis - Food Science, PMI, PS, PHR

• California Veterinary Diagnostic Lab

• Salisbury University (USA)

• DoD - Walter Reed Hospital

• Mass General - Harvard hospital system

• RIVM (Netherlands)

• DTU (Denmark)

• MEFOSA (Lebanon)

• Sydney Technical University (Australia)

• Rajiv Gandhi Biotechnology Institute (India)

• Institute of Environmental Science & Research (NZ)

• Oak Ridge National Laboratory (ORNL)

• ANSIS (France)

• Additional negotiations in process with groups from Asia, Africa, Europe

• Corporate Affiliates• OpGen

• Kapa Biosystems

• PerkinElmer

• BGI@UCDavis

ADDITIONAL PARTICIPANTS WELCOME

100K project Web site

http://100kgenome.vetmed.ucdavis.edu

Organisms of Interest• Salmonella

• Listeria

• Campylobacter

• Vibrio

• E. coli

• Shigella

• Yersinia

• Clostridium

• Enterococcus

• Cronobacter

• Norovirus

• Hepatitis A

• Enteroviruses

• World-wide representation to capture genomic diversity to represent pan-genome for the most important organisms

Initial focus

Logistical challenges

• Large number of isolates to authenticate & bank

• DNA isolation from many types of bacteria (i.e. lysis)

• DNA quality on large scale

• Large number of draft genomes of variable quality

• Sequencing strategy to meet the needs of the user community• Government• Industry• Public health

• Informatics to derive actionable information quickly

• Biomarker discovery and implementation for daily use

Isolate bank

Authenticated & banked Isolates(~3,500 isolates)

Pending authentication & banking(~15,000 isolates)

Isolates by region

Submission Logistics• Isolate submission

• Isolate agreement• MTA• Timing and specific isolates

• International isolates• All permits in place• Timing negotiated

• Sequencing – • BGI@UCDavis – short reads• UC Davis and others – long reads• Return data to submitter via 100K bioproject page• 12 months for review• Release data at NCBI for public access

• Data return & analysis

• Publication & public database release

100K Sequencing Process

Bank cultureIsolate DNA

Library construction(300-600 libraries/day)

Make library Genome DB & feature ID

Short read technologies

Long read technologies

Whole Genome Mapping

Sequence library

SE SNP Analysis

Yearly SNP evolution

elephant seal, lung

sea lion, liver

elephant seal, kidney

equine feces

‘08, sea lion, liver

’98 otter, abdomen

’05 sea lion, uterus

1993

‘02 rodent feces’11 elephant seal urine ’11 elephant seal brain

Finished genomes & the analytical future

• Sea mammal outbreak (SNP) (Deng et al. 2013)

• Food outbreaks (SNP)

• Data analysis post sequencing• Comparison for SNPs• Gene content = annotation• Content comparison = forensics

• Gene• Protein• COGs/GO use in statistical enrichment

• Going beyond SNPs and beyond• Guided biomarker discovery• Genomics – limited based on sequences available

Active Outbreaks & Complete Genomes

• Salmonella enterica subsp. enterica serovar Javiana • Common in fresh cut

produce

• Only one previously sequenced genome (JCVI, 2008), 19 contigs

• Isolate CFSAN001992_73:• Clinical Arizona isolate from

produce-related 2012 outbreak

• Complete process from isolate to finished genomic sequence <1 week

• 1 chromosome; 2 plasmids containing never-seen sequence:

Collaboration with M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muravanda, S. Musser (FDA), B. Weimer (UC Davis), Jonas Kolach (PacBio)

Initial Salmonella Genomes - PacBio

Strain Sequencing Genome size Additional genomic

elements

S. Bareilly (SAL2881) 8 SMRT Cells 4,730,611 bp 78,193 bp

S. Heidelberg (318_04) 8 SMRT Cells 4,793,478 bp 117,929 bp; 35,296 bp;

3969 bp

S. Heidelberg (2069) 8 SMRT Cells 4,783,941 bp 110,345 bp; 37,704 bp

S. Typhimurium (2048) 8 SMRT Cells 4,967,892 bp 142,804 bp; 48,532 bp

S. Javiana (1992_73) 8 SMRT Cells 4,629,444 bp 24,013 bp; 17,094 bp

S. Cubana (2050) 12 SMRT Cells 4,977,480 bp 166,668 bp; 122,863 bp

S. St. Paul SP3 8 SMRT Cells 4,730,130 bp none

S. St. Paul SP48 8 SMRT Cells 4,940,224 bp 44,606 bp; 40,801 bpCollaboration with Pacific BioSciences, M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muruvanda, S. Musser (FDA), R. Roberts (NEB), B. Weimer (UC Davis)

Salmonella Strain Epigenomes

complete methylation

partial methylation

Bareilly(SAL2881)

Heidelberg (CFSAN000318_04)

Javiana (CFSAN001992_73)

Typhimurium (CFSAN001921_01)

St Paul(SP3)

5’-GATC-3’/3’-CTAG-5’

5’-CAGAG-3’/3’-GTCTC-5’

5’-ATGCAT-3’/3’-TACGTA-5’

5'-CAGCTG-3'/3'-GTCGAC-5'

5'-GATCAG-3'/3'-CTAGTC-5'

5’-ACCANCC-3’/3’-TGGTNGG-5’

5’-CCGAN5GTC-3’/3’-GGCTN5CAG-5’

5’-GAGN6RTAYG-3’/3’-CTCN6YATRC-5’

5’-GN2TAYN5RTGG-3’/3’-CN2ATRN5YACC-5’

5’-GpsAAC-3’/3’-CTTpsG-5’

Collaboration with M. Allard, E. Brown, E. Strain, M. Hoffman, T. Muravanda, S. Musser (FDA), B. Weimer (UC Davis), Jonas Kolach (PacBio)

SP48 Mobile Element Analysis

• Using PHAST (http://phast.wishartlab.com)

• “Phage_Gifsy_2” is chromosomal (#3) and plasmid encoded (#6)

• Fels and Gifsy - virulence

http://phast.wishartlab.com/

SP48 Epigenome Determination

• Methyltransferase and Phosphorothioation (PT)-mediating systems specificities:

• 3 active methyltransferases and 1 active phosphorothioation system detected

• PT = unknown function

PT

Vibrio Mobile Element Analysis

• Using PHAST (http://phast.wishartlab.com)

• 2 putative methyltransferases encoded within Phage elements

• Contains incomplete phage elements associated with shiga toxin production (stx)


Listeria monocytogenes

PLoS ONE 8: e67511, June 25 2013

Listeria monocytogenes De Novo Assemblies

Den Bakker et al. (2013):• 4 kb plasmid library (Sanger)

• 10 kb plasmid library (Sanger)

• 40 kb fosmid library (Sanger)

• 454 library

• Illumina library

This study* - 10 kb PacBio library

Serotype StrainContig

sAssembly size

1/2a 10403S 1 2,903,1061/2a FSL N3-165 39 2.88x106

1/2b FSL N1-017 79 3.14x106

1/2b FSL R2-503 55 2.99x106

1/2c FSL R2-561 1 2,973,8013a Finland1998 1 2,874,431

4bAureli1997 (HPB2262)

79 2.99x106

4c FSL J2-071 53 2.85x106

Serotype StrainContig

sAssembly size

1/2a 861 1 2,988,9471/2a 878 1 2,981,8861/2a 899 1 2,958,9081/2a 1846 1 2,947,4741/2a 2074 1 2,897,1401/2a 2625 1 2,896,5871/2a 2626 1 2,907,0561/2a 2676 1 2,947,293

1/2b 859 2

3,034,043 (chr)

57,557 (plasmid)

1/2b 867 1 2,943,218

1/2b 911 2

3,094,342 (chr)

148,959 (plasmid)

1/2b 2624 1 2,932,4951/2b G4599 1 3,011,693

4b 1493 2

2,953,719 (chr)

55,804 (plasmid)

4b 1494 2

2,953,716 (chr)

55,804 (plasmid)

4b 1495 2

2,953,708 (chr)

55,803 (plasmid)

*Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)

Whole-Genome Organization Comparison

Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis)

861

878

899

1846

2074

2625

2626

2676

859

867

911

2624

G4599

1493

1494

1495

4b1/

2b1/

2a

Mobile ElementsSerotype Strain intact incomplete questionable

1/2a 861 2 2 01/2a 878 0 1 11/2a 899 1 1 11/2a 1846 1 1 11/2a 2074 0 0 11/2a 2625 1 1 01/2a 2626 1 1 01/2a 2676 2 1 01/2b 859 1 3 11/2b 867 1 1 01/2b 911 2 3 11/2b 2624 0 1 01/2b G4599 1 1 1

4b 1493 0 1 14b 1494 0 1 14b 1495 0 1 1

L2625

L2

67

6

L2676

L2625

Collaboration with C. Tarr (CDC), H. den Bakker (Cornell U), R. Roberts (NEB), B. Weimer (UC Davis) http://phast.wishartlab.com/


Serotype 1/2a 1/2b 4b

Methyltransferase Specificity Modified Base 861 878 899 1846

2074

2625

2626

2676 859 867 911 262

4G4599

1493

1494

1495

5'-GATC-3'3'-CTAG-5'

m6A 99.5 98.9

5’-GATC-3’3’-CTAG-5’

X 8.4 12.2 15.8

5'-GACN5GGT-3'3'-CTGN5CCA-5'

m6A 98.9 98.9

5'-GAN6TGCG-3'3'-CTN6ACGC-5'

m6A 99.7 99.6 100

99.8 99.8 99.9

5'-TACBN6GTNG-3'3'-ATGVN6CANC-5'

m6A 99.7 99.8

5'-TAGRAG-3'3'-ATCYTC-5'

m6A 99.3

5'-GTATCC-3'3'-CATAGG-5'

m6A 99.6 99.7

99.1 98.8

99.2 98.0

L. monocytogenes Epigenomes

same epidemic clone (ECVII)


Serotype 1/2a 1/2b 4b

Methyltransferase Specificity Modified Base 861 878 899 1846

2074

2625

2626

2676 859 867 911 262

4G4599

1493

1494

1495

5'-GATC-3'3'-CTAG-5'

m6A

5’-GATC-3’3’-CTAG-5’

X

5'-GACN5GGT-3'3'-CTGN5CCA-5'

m6A

5'-GAN6TGCG-3'3'-CTN6ACGC-5'

m6A

5'-TACN7GTNG-3'3'-ATGN7CANC-5'

m6A

5'-TAGRAG-3'3'-ATCYTC-5'

m6A

5'-GTATCC-3'3'-CATAGG-5'

m6A

L. monocytogenes Epigenomes

5’-GAxTC-3’ 5’-Gm6ATC-3’


100K Bioproject

http://www.ncbi.nlm.nih.gov/bioproject/186441

Project status

• Year 1:• Focus on the top 50 Salmonella outbreak serotypes• Banked ~3500 isolates• Developing world-wide partnerships• Automate sequence library construction• Sequence 1500 isolates• NCBI 100K Bioproject

• Year 2-5• Bank additional isolates• Automated, routine library construction• Sequence ~25K genomes/year• Finish 1000 genomes to a single closed genome• Generate epigenomic data• Define high resolution map assemblies for small set• Define need for additional bioinformatics

Initial public release of draft genomes May 2013

Second public release of 20 closed genome July 2013

Third public release of 1500 genomes…Fall 2013

Reduce time to resultTraditional - Enrichment

Collect sample

T0

T24 T36 T48 T60 T72+

Ship to labPre-enrich

Log sample PrepEnrich

PlateSelective enrichPresumptive

Examine plateConfirm Bank

Examine plateConfirm Bank

IDCharacterizeGenomics

Next generation – Culture independent

Collect sample

T0

T0.5 T1 T2 T3 T4+

Capture & concentrate

PresumptiveRelative amt.

Directed plateDirected enrichDNA prep

Multiplex PCRqPCRSequencing prep

Confirm PCRSequenceBank DNA

IDCharacterizeIn/out of event

Molecular salmonella testing in 2 hours

Lyse cells Add primers

PCR (45 mins)

• Colony• Enrichment broth• Lab medium• Capture/

concentration

Action based on: 1. Salmonella detection2. serotype determination

Report Salmonella, serogroup &

serotype

Nano electrophore

sis (30 mins)

Molecular detection validation

• Validation Approach• ~1750 isolates tested• Designed to detect ~30 most common serotypes• Multiple matrices

• Verification & Validation • Validated by 3 independent labs • 100% accurate in Salmonella identification with 6

independent blinded panels• 98% accurate for serotype determination

Workflow innovation

NGS ofentire genome

Library construction

NGSSequencing

Bioinformatics &statistics

1.5 hoursTotal time6 hours 48 hours

~24-130 hours*

Enrichment & colony isolation

Result

Nano Electrophoresis

Colony multiplex

PCR

Broth multiplex

PCR

Culture independentcapture/concentration

Presumptive ID withSolid phase ELISA

Nano Electrophoresis

Critical Needs • Robust biomarkers • Fast, actionable answers• Novel sequence analysis

strategies

AcknowledgementsWeimer Lab

• Dr. Yi Xie

• Dr. Richard Jeannotte

• Dr. Holly Ganz

• Dr. Marie Forquin

• Dr. Prerak Desai

• Dr. Jigna Shah

• Ms. Nugget Dao

• Ms. Mai Lee Yang

• Ms. Kao Thao

• Ms. Winnie Ng

• Ms. Carol Huang

Thanks to the sponsors:FDA/CFSANUSDADARPAUS Air ForceAgilent TechnologiesCA Dairy industryPacific BiosciencesMars, Inc.UCD Wildlife Health Center

cBio• Dr. Kumar Hari• Dr. Ravi Jane

UCD/CAHFS/SVM• Dr. Kris Clothier• Dr. Barb Byrne• Dr. Woutrina Miller• Dr. Linda Harris• Dr. Maria Marco

Agilent Technologies• Dr. Rudi Grimm• Dr. Lenore Kelly• Dr. Steffan Müeller• Dr. Steve Royce• Dr. Paul Zavitsanos

PacBio• Jonas Korlach• Luke Hickey

UC Santa Barbara• Dr. Mike Mahan

OpGen• Dr. Erin Newburn

CFSAN• Dr. Marc Allard• Dr. Eric Brown

Thank You…

Bart Weimer Professor, UC Davis

Director, BGI@UC Davis

Director, 100K Genome Project

Director, Integration Core, NIH-West Coast Metabolomics Center

[email protected]

530.754.0109

Questions?

Documents

Genomics in food security: 100K Pathogen genome Project Bart Weimer, Ph.D. Professor UC Davis - School of Veterinary Medicine Director BGI@UCDavis