21
6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (GenFS) established by this Charter represents a substantial effort to strengthen collaboration and coordination of Federal public health and regulatory food safety responsibilities of the Centers for Disease Control and Prevention (CDC), the Food and Drug Administration (FDA), and the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) of the US Department of Health and Human Services, and the Food Safety and Inspection Service (FSIS) of the US Department of Agriculture. GenFS will strengthen Federal collaboration by addressing crosscutting priorities for molecular sequencing of foodborne and other pathogens causing human illness, and data collection, analysis and use, as outlined in the key findings of the Report of the Real Time Whole Genome Sequencing Surveillance MultiAgency Collaboration Meeting, September 2223, 2014, Natcher Center, NIH, Bethesda Maryland Interagency Collaboration on Genomics and Food Safety (Gen-FS)

07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

1

Fostering Collaboration for Public Health: The Role of NCBI

William KlimkeAPHL 2016

The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established by this Charter represents a substantial effort to strengthen collaboration and coordination of Federal public health and regulatory food safety responsibilities of the Centers for Disease Control and Prevention (CDC), the Food and Drug Administration (FDA), and the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) of the US Department of Health and Human Services, and the Food Safety and Inspection Service (FSIS) of the US Department of Agriculture. 

Gen‐FS will strengthen Federal collaboration by addressing cross‐cutting priorities for molecular sequencing of foodborne and other pathogens causing human illness, and data collection, analysis and use, as outlined in the key findings of the Report of the Real Time Whole Genome Sequencing Surveillance Multi‐Agency Collaboration Meeting, September 22‐23, 2014, Natcher Center, NIH, Bethesda Maryland

Interagency Collaboration on Genomics and Food Safety (Gen-FS)

Page 2: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

2

4

FDA/CDC Real Time Listeria Project

FDA & CDC could leverage existing systems & work flows…

Could NCBI play a role?

Listeria Annual Stats (CDC)• ~1600 cases• ~260 deaths• ~$230 million (USDA ERS)

Page 3: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

3

Whole Genome Sequencing (WGS)Listeria Pilot Project

Started September 2013

Goal: Sequence all Listeria monocytogenes isolates

Near real‐time (<1 week for patient isolates)

Public Health Agency of Canada

Page 4: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

4

http://www.globalmicrobialidentifier.org/

Vision and objectivesThe vision is to develop a global system to aggregate, share, mine and use microbiological genomic data to address global public health and clinical challenges, a high impact area in need of focused effort. Such a system should be deployed in a manner which promotes equity in access and use of the current technology worldwide, enabling cost-effective improvements in plant, animal, environmental and human health.

Global Microbial Identifier

Page 5: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

5

sample_name

organism

strain/isolate

Category (attribute_package)

1a) Clinical/Host‐associated

1a1) specific_host

1a2) isolation_source

1a3) host‐disease

OR 

1b) Environmental/Food/Other

1b1) isolation_source

collection_date

Geographic location

6a) geo_loc_name

OR

6b) lat_lon

collected by

Where

When

Who

What

minimal metadata 

NCBI Biosample – Pathogen Template (Foodborne Outbreaks)

Page 6: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

6

Type Submissions

pathogen 117406

pathogen: clinical/host‐associated 68458

pathogen: food/environmental/other 48948

with publicly available SRA data 83243

Salmonella 48967

Listeria 12116

Campylobacter 2978

Escherichia and Shigella 13011

Other 40334

NCBI Biosample – Pathogen Template Total Submissions (May 2016)

Type SubmissionsKlebsiella 1815Acinetobacter 1906Enterobacter 822Staphylococcus 1960Streptococcus 4337Legionella 296Viruses 8589Serratia 125Pseudomonas 1133Mycobacterium 6161Vibrio 1149Bordetella 205Bacillus 332Neisseria 985

NCBI Biosample – Pathogen Template Other pathogens (May 2016)

Page 7: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

7

NCBI Pathogen Detection Pipeline Submissions and Analysis

NCBI Submission Portal

BioSamples

SRA

GenBank

BioProject

NCBI Pathogen Pipeline

Kmer analysis

Genome  Assembly

Genome  Annotation

Genome  Placement

Clustering

SNP analysis

Tree Construction

Reports

QC

USA

UK

Aus

Clinical

NCBI Pathogen Detection Pipeline

Submissions (Jan – May, 2016)

Page 8: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

8

Type Total targets in k‐mer tree

Targets in clusters (single linkage <= 50 SNPs)

Salmonella 45297 38794Listeria 9621 8135E. coli & Shigella 13144 6046Campylobacter 2234 1569Acinteobacter 2179 1299Elizabethkingia 89 74Serratia 336 227Klebsiella  1194 677

Contributions of enteric pathogensfor food safety

http://www.ncbi.nlm.nih.gov/pathogens/contributors/

Page 9: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

9

Page 10: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

10

Contributions of clinical pathogens

http://www.ncbi.nlm.nih.gov/pathogens/

Results Available Now

Page 11: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

11

Page 12: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

12

NCBI’s Role in Combating Antibiotic Resistant Bacteria

“Create a repository of resistant bacterial strains (an “isolate bank”) and maintain a well‐curated reference database that describes the characteristics of these strains.”

“Develop and maintain a national sequence database of resistant pathogens.”

Page 13: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

13

Page 14: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

14

Clin Infect Dis. 2014 Aug 1;59(3):390‐7. doi: 10.1093/cid/ciu319. Epub 2014 May 1.

MBio. 2015 Jul 28;6(4):e01030. doi: 10.1128/mBio.01030‐15.

Page 15: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

15

AMR efforts at NCBI

• With collaborators, build database of sequenced isolates with standardized AMR metadata (i.e. accept antibiograms)  (2019 Samples as of May 16 ‐http://www.ncbi.nlm.nih.gov/biosample/?term=antibiogram[filter])

• Collaborators include: (CDC, WRAIR, FDA, B&W)

• Stable, up‐to‐date database of AMR genes with standardized nomenclature• Collaborators (CARD)

• – RefSeq set released by June 2016

• Implement and validate tools for identifying AMR genes in new isolates

Antibiogram Fields• Fields designed to find balance between comprehensiveness and ease of submission

• Data dictionaries based on outside expertise (ASM, CLSI) standardize input and minimize ‘data drift’

Page 16: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

16

NCBI Outputs

Kmer tree

ftp://ftp.ncbi.nlm.nih.gov/pathogen/Results/Listeria/latest/

• Genome Workbench• full SNP reports• Integrated web‐based interactive 

system*• AMR reports*• wgMLST*

Acknowledgements

Joshua CherryMichael DiCuccioWilliam KlimkeAleksandr MorgulisEyal MozesArjun PrasadKirill RotmistrovskyAlejandro SchafferSergey ShiryevMartin ShumwayAlexander SouvorovLukas WagnerAlexander Zasypkin

CDCFDA/CFSANUSDA‐FSISPHE/FERANIAIDWRAIRBroadWadsworth/MDH

pd‐[email protected]

This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.govNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA

David LipmanJames Ostell

SRA teamSystems groupSubmission Portal

Page 17: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

17

Automated Bacterial Assembly

SRA Reads sample 1

Trim reads (Ns, adaptor)

Reference  Distance tree

Find closest reference genome(s)

ArgoCA (Combined Assembly)

De novo assembly panel

Argo (Reference assisted assembly) SOAP denovo GS‐assembler (newbler)MaSuRCA Celera Assembler 

Reads remapped to combined assembly

Contig fastaRead placements (bam)Quality profile

SPAdes

Page 18: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

18

NCBI Pathogen Detection SNP Pipeline Web viewer (coming soon): example 3 – Elizabethkingia outbreak

704 SalmonellaEnteritidis

7102 columns (filtered)

Compatibility Parsimony

Data plus “noise”7402 total columns

Add “noise”:300 

columns that had been 

removed by filtering

No Changes 

to Topology

Many Changes and 

conflicts (47 + 43)

More Conflicts

(5 + 6 branches, of ~1100 total)

Few Conflicts between Topologies

R&D: Tree Building

Page 19: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

19

wgMLST approach• Complementary to SNP analysis e.g. consistency check

• Efficient for initial clustering of all isolates in species

• Generate loci using “essentially complete” RefSeq genomes

Organism Number of loci Genome in loci Number of genomes Major species

Acinetobacter 2420 58.25% 43/47 Baumannii

Campylobacter 1257 68.36% 90/132 Jejuni

Escherichia 2896 52.97% 159/165 Coli

Klebsiella 4004 82.54% 67/82 Pneumoniae

Listeria 2364 73.88% 73/81 Monocytogenes

Salmonella 3469 66.98% 137/147 Enterica

R&D: wgMLST

• Fast & relatively simple• Epidemiologists are 

familiar with it• Good for initial clustering• Different heuristics• Can use special markers 

for e.g. serovars• Still need to deal with 

assembly errors• Recombination can still 

be a problem…

wgMLST – a complementary 

method

Loci are notindependent

R&D: wgMLST

Page 20: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

20

1. Initial partition of isolates within each species by kmer distances

2. Within each partition, blast comparison of all pairs of genomes

3. Single linkage clusters with at most 50 SNPs

4. Within clusters, SNPs with respect to one reference

5. Generate final SNP list and phylogenetic trees

Filtering:• Base level• Repeat • Density

Problematic genomes are eliminated at various points along the way

SNP pipeline

High SNP densityCumulative count of differences

Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430‐4. 

Page 21: 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL 2016 The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established

6/27/2016

21

Number of RefSeq genomes with AMR hits

OrganismCarbapenem‐resistant beta 

lactamase alleles

GES KPC NDM OXA IMP VIM IMI

3221  Escherichia coli 0 74 32 2 6 0 0

1096 Acinetobacter baumannii 0 2 32 2861 6 0 0

1081 Pseudomonas aeruginosa 0 6 0 0 0 234 0

781 Klebsiella pneumoniae 2 930 96 10 6 0 0

314 Enterobacter cloacae 0 278 8 0 6 0 1

74 Enterobacter aerogenes 0 16 0 0 0 0 0

72 Klebsiella oxytoca 0 20 4 0 0 0 0

70 Serratia marcescens 0 2 0 0 0 0 0

30 Citrobacter 0 24 4 0 0 3 0

NCBI Pathogen DetectionCarbapenem resistant beta lactamase alleles found

Organism Submitter

Number ofgenomes withcarbapenemases KPC NDM OXA

Salmonella CDC 2 1 1 0

Salmonella PHE 12 0 1 11

Serratia marcescens B&W Hospital 1 2 0 0

Pseudomonas aeruginosa B&W Hospital 2 0 0 3

Escherichia coli B&W Hospital 1 0 0 2

Klebsiella pneumoniae B&W Hospital 10 10 0 0

Enterobacter cloacae B&W Hospital 7 7 0 0

Acinetobacter B&W Hospital 6 0 0 10