18
Bioinformatics Approaches to Supporting Outbreak Investigations CFSAN-SNP and the Lyve-SET Pipeline Kevin G. Libuit, M.S. Senior Informatics Scientist Division of Consolidated Laboratory Services

Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Bioinformatics Approaches to Supporting Outbreak InvestigationsCFSAN-SNP and the Lyve-SET Pipeline

Kevin G. Libuit, M.S.Senior Informatics ScientistDivision of Consolidated Laboratory Services

Page 2: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Inferring Genetic Relatedness from WGSClustering bacterial isolates to infer epidemiological associations• Based on genetic relatedness inferred from whole-genome

sequencing (WGS) data

Predominant approaches in public health bioinformatics: • Multiple sequence alignment (MSA)• Core/whole genome multilocus sequence typing (c/wgMLST)• Single nucleotide polymorphism (SNP)

Page 3: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Single Nucleotide Polymorphism (SNP)

SNP:• Significant changes in single nucleotide positions, with respect to a

reference genome• Isolates clustered through a pairwise comparison of SNPs

identifiedCommon processes of a SNP Pipeline

Page 4: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Single Nucleotide Polymorphism (SNP)

Reference genome

2014C-

2014C-3

2014C-3

2014C-

2014C-

2014C-3

2014C-3

2014C-3

2014C-3

100

100

100

100

Genome 1 SNP profileGenome 2 SNP profile

Genome 4 SNP profile

Genome 3 SNP profile

Isolate Clustering (dendrogram)

1. Read Mapping2. SNP-Calling3. Phylogenetic Inference

Page 5: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

66 responses22 different pipelines

Twitter Poll from Anita Schürch (UMC) on SNP Pipeline Popularity (conducted October 2017)

Presenter
Presentation Notes
17 responses for custom pipeline
Page 6: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Identifying the Appropriate SNP PipelineLiterature review• Clustering bacterial isolates to infer epidemiological associations• Microbial foodborne pathogensCommunicating with collaborators • Other state and federal public health laboratories

CFSAN-SNP1 & Lyve-SET Pipeline2

Presenter
Presentation Notes
***install and accessability (screen shot) Peer reviewed and open sourcec; what are
Page 7: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

CFSAN-SNP and the Lyve-SET Pipeline

CFSAN-SNP (FDA) Lyve-SET (CDC)Phage Masking FALSE TRUERead Mapping BowTie2 SmaltSNP Caller VarScan VarScanCoverage 8x 20xConsensus 60% 95%Density Filtering 333bp 5bpPhylogenetic Inference FastTreev2.1* RAxMLv8

Presenter
Presentation Notes
% masking
Page 8: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Access and Operation of the SNP PipelineDownload, installation, and usage:• Operating system and compute specifications• Graphic user or command line interface• Bioinformatics experience and background of personnel

Page 9: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Validating SNP PipelinesFDA and CDC curated benchmark dataset expectations• Comparison of a group of isolates with known cluster profile

• Outbreak isolates should cluster separately from non-outbreak strains

2014C-

2014C-3

2014C-3

2014C-

2014C-

2014C-3

2014C-3

2014C-3

2014C-3

100100

100

100

Page 10: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Virginia’s SNP Analysis Workflow• Active surveillance by PFGE• Requests for SNP analysis when above baseline or

temporal/geographic clustering observed• CFSAN-SNP and Lyve-SET • Trees and matrices assessed internally

• Line lists shared with state epidemiologists

Page 11: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

CFSAN-SNP and Lyve-SET PipelineIn general: • Topological agreement• Minor discrepancies in SNP-distances

Occasional topological discrepancies with vast SNP discrepancies • Multiple approaches allows for further investigation and

troubleshooting, if necessary

Page 12: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Salmonella enterica subsp. Heidelberg outbreak investigation: CFSAN-SNP Output

24 isolates: 23 outbreak-associated + single outgroup - Putative outbreak-clade of 22- Outgroup min 55 SNPs

Page 13: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Salmonella enterica subsp. Heidelberg outbreak investigation: LYVE-SET Output

24 isolates: 23 outbreak-associated + single outgroup - Putative outbreak-clade of… 24 isolates (?)- All isolates with SNP-distance <8

Page 14: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Integrating Bioinformatics SolutionsFactors to consider: • How bioinformatics and WGS is going to inform public

health decisions• Multiple approaches extant in the field• Accessibility and operation of the bioinformatics tool• Validate local functionality • If possible, employ more than one approach

Presenter
Presentation Notes
**** talk about how to access thses tools
Page 15: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Kevin G. Libuit, M.S.Senior Informatics ScientistDivision of Consolidated Laboratory ServicesEmail: [email protected]

1. Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A, Rand H, Strain E. (2015) CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJComputer Science 1:e20.

2. Katz LS, Griswold T, Williams-Newkirk AJ, Wagner D, Petkau A, et al. (2017) A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens. Frontiers in Microbiology 8:375.

Page 16: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz
Page 17: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Salmonella enterica subsp. Enteritidis outbreak investigation: CFSAN-SNP Output

Page 18: Bioinformatics Approaches to Supporting Outbreak ......an automated method for constructing SNP matrices from next -generation sequence data. PeerJ Computer Science 1:e20. 2. Katz

Salmonella enterica subsp. Enteritidis outbreak investigation: Lyve-SET Output