52
João André Carriço Microbiology Institute and Mramirez Lab, Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon [email protected] twitter: @jacarrico Bioinformatic Open Days 2017 Braga 22 February 2017

Genomic Epidemiology: How High Throughput Sequencing changed our view on bacterial strain similarity

Embed Size (px)

Citation preview

Page 1: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

João André CarriçoMicrobiology Institute and Mramirez Lab, Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico

Bioinformatic Open Days 2017Braga 22 February 2017

Page 2: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Jon Snow

English physician 1813-1858

Page 3: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Total : 616 dead31 Aug -10 Sep: 500 dead

Page 4: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

"the study of what is upon the people”

the branch of medicine which deals with the incidence, distribution, and possible control of diseases and other factors relating to health.

It is the cornerstone of public health, and shapes policy decisions and evidence-based practice by identifying risk factors for disease and targets for preventive healthcare

“shoe-leather epidemiology” and lots of statistics

Page 5: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

EBOLA West African Ebola virus epidemic

Page 6: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

2011 Germany E. coli O104:H4 outbreak :

bloody diarrhea accompanied by

hemolytic-uremic syndrome (HUS)

Genomic sequencing by BGI Shenzhen confirm a 2001 finding that the O104:H4 serotype has some enteroaggregative E. coli (EAEC or EAggEC) properties, presumably acquired by horizontal gene transfer

On 8 June, the EU's E. coli O104:H4 outbreak was

estimated to have cost ~2,690,000,000 EUR in human

losses (such as sick leave), regardless of material losses

(such as dumped cucumbers - ~240M Euro in Spain only)

Crowdsourcing event started at ABPHM2011

Page 7: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Smith, K. F. et al. Global rise in human infectious disease

outbreaks. Journal of The Royal Society Interface 11,

20140950–20140950 (2014).

Page 8: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Flight paths across North America.

Outbreaks follow flight paths more closely than

simple geographic distance.

Outbreaks have costs in mortality, mobility and

other direct economic impacts (product recall)

Fast intervention can save lives and money

Slide credit: Fiona Brinkman

Page 9: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

1. Didelot, X., Bowden, R., Wilson, D. J., Peto, T. E. A. & Crook, D. W. Transforming

clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13, 601–612

(2012).

Page 10: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

“Not all bacteria are born equal”

Page 11: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

discriminating strains within a species/subspecies

Gel based:Pulsed Gel ElectrophoresisRAPDAFLP

Phenotypic based:Colony morphology/colorAntibiogramSerotype

Sequence Based:MultLocus Sequence Typing (MLST)emm typingspa typing

ST aroE gdh gki recP spi xpt ddl

156 7 11 10 1 6 8 1

Page 12: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Bacterial Population

Genetics

Pathogenesis and

NaturalHistory ofInfection

Surveillance ofInfectiousDiseases

Outbreak Investigation

and Control

Page 13: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

S pneumoniae housekeeping genes

?

?

?

??

?

?

PCR

aroE

gdh

gki

recP

spi

xpt

ddl

7 Sequences

wwwhttp://pubmlst.org/spneumoniae/

Retrieve alleles and ST

ST aroE gdh gki recP spi xpt ddl

156 7 11 10 1 6 8 1

1

?aroE

gdh

gki

recP

spi

xpt

ddl

8

1

6

7

11

10

ST 156

SangerSequencing

Greatest advantages:Sequence reduced to allele IDPortableEasy to infer relationship

Nomenclature

Page 14: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Clinical

animalNA

community

HospitalSurv/Outb

Enterococcus faecium

More on this later…

Page 15: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

The Internet

Sequence-basedInformation

But only 7 target loci ….

Page 16: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

HiSeq 2000

MiSeq

PacBIO

Page 17: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

OXFORD NANOPOREMinION

https://nanoporetech.com/products/minion

Page 18: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

https://nanoporetech.com/products/smidgion

OXFORD NANOPORESmigION

Page 19: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity
Page 20: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Alikhan, N.-F. et al., 2011. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC genomics, 12, p.402.

Bacterial Draft Genomes:- 1 circular chromossome- 1.5 -4 MB(most of them)- From a few to hundreds of contigs- may contain Plasmids- Hundreds of thousands bacterial

read sets are already available on SRA /ENA

- Usually sequenced at 30-100x depth of coverage

- Cost : 70-150 EUR (Illumina) / 500-2000 EUR (PacBIo/Nanopore)

Page 21: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Sequencing & Bioinformatics

• Sequencing, Assembly Pipeline Parameters

• QA/QC Metrics• Tree Construction Details

Sample Information

• Isolation source• Food, Clinical, Environment• Food category, Body Product• Dates, Location

Clinical and Epi Details

• Demographics• Host disease, Symptoms • Lab Test Results• Exposures

Slide credit: Will Hsiao

Page 22: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

investigations using integrated microbial genomic data & “metadata”(lab, epidemiological data)

… aiming to save lives, economies

Slide credit: Fiona Brinkman

Page 23: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Chronicle of a Death Foretold

http://en.wikipedia.org/wiki/File:ChronicleOfADeathForetold.JPG

Game Changer for Microbial Typing

From the reads much more information can be extracted :

- gene-by-gene approaches: wgMLST, cgMLST

- SNP comparison approaches: comparison with reference

strains

- k-mer based distances

- Ability to recover most of the present sequence based

typing information in a single experimental procedure

- Greatly Increased discriminatory power

- Unifies genomics and typing

Page 24: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Microbiological

Sample

The Ideal Scenario

Magic Box of

NGS Wonders

for

Microbiology

Completely characterized strain:

• Antibiotic resistance profile• Multilocus Sequence Typing (MLST)• Virulence factors present• Other SBTM information .Ex:

• spa (S. aureus)• emm (Group A Streptococcus)

Desired End result:

Risk Assessment of the strain and

Useful application of the data to clinical practice

Comparison between groups of strains

Page 25: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Didelot, X., Bowden, R., Wilson, D. J., Peto, T. E. A. & Crook, D. W. Transforming

clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13, 601–612

(2012).

Page 26: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

sample

HTS

reads

Reference

genome

VCF/Fasta File

with SNPs

• Uses a reference strain:• Outbreak determination• Comparative studies• Monomorphic (Clonal) species

• Recombination/Horizontal gene transfer must be detected and removed from phylogenetic analysis

• Difficult to create a nomenclature (due to different references)

Read mapping software

Phylogenetic/Minimum spanning Tree

Page 27: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

61 Streptococcus dysgalactiae subspecies equisimilis isolates

Roary presence and absence matrix (10661 gene clusters)

Core (n or n-1 strains)

Soft-Core (n-2 or n-3 strains)

Shell( 8(?) to n-3 strains)

Cloud( <8 (?) strains)

Core genome:Core + Soft-Core

Accessory genome:Shell + Cloud

Catarina Inês Mendes(as you already noticed decorations were only for Xmas)

Page 28: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Virulome

Core genomeAccessory genome

Mobilome

Page 29: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Central nomenclature server:

Schemas,

Allele /Profile IDs

contigs

Output :Allelic Profile

• Expansion of the MLST concepts to core/pan genome

• Buffers recombination effect• Simpler to create a nomenclature• Population structure of non-

monomorphic species• Easy to compare thousands of

samples using thousands of loci• Handling Missing data is still an open

problem

sample

HTS

reads De novo assembly software

Phylogenetic/Minimum spanning Tree

Page 30: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

This is Chewbacca ... He is chewBBACA’s cousin

Our approach to the problem:

Mickael Silva(He didn’t bring the glasses…)

Page 31: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

https://pmcvariety.files.wordpress.com/2014/06/eli-wallach-dead-good-bad-ugly.jpg?w=670&h=377&crop=1

Page 32: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

My Goals/ Areas that I want to apply WGS to: • Microbial population structure• Microbial Evolution• Microbial Genomics : gene structure, genome synteny,

Mobile Genetic Elements detection

My toolbox is chosen based on my questions and what I want to do !

Trying to avoid:“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” - Abraham H. Maslow (1962), Toward a Psychology of Being

Page 33: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Sequence QA/QCFastQChttp://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Adaptor and Quality trimming:trimmomatichttp://www.usadellab.org/cms/?page=trimmomatic

AssemblySPAdeshttp://bioinf.spbau.ru/spades

Velvet http://www.ebi.ac.uk/~zerbino/velvet/

MappingBowtie2http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Annotation:Prokkahttp://www.vicbioinformatics.com/software.prokka.shtml

Whole genome comparisonBRIG (Blast Ring Generator)http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

MAUVEhttp://darlinglab.org/mauve/mauve.html

Page 34: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

http://rugbyea.com/wp-content/uploads/2013/05/blast.jpghttp://www.ecohealthypets.com/writable/pet_report_photos/photo/480x/ball_python_2.jpg

Page 35: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity
Page 36: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

- Perform the same analysis over tens, hundreds or thousands of strains : your own and publicly available

- Integrate multiple analysis in a single pipeline- Pipelines = reproducibility (if not something is very wrong)

http://www.ebi.ac.uk/ena

http://www.ncbi.nlm.nih.gov/sra

Page 37: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

A Standardized Pipeline for Bacterial Genome Assembly and Quality Control

Miguel Machado

(not wearing his wolf suit today…I think...)

Page 38: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Virulence Factor Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm)

Pathosystems Resource Integration Center (PATRIC) VF (https)://www.patricbrc.org/)

Victors (http://www.phidias.us/victors/)

PHI-Base (http://www.phi-base.org/)

MvirDB (http://mvirdb.llnl.gov/ )

To know more: - Presentation on the Controversies in interpreting whole genome sequence data session :

http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases

Page 39: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Comprehensive Antibiotic Resistance Database (CARD) (https://card.mcmaster.ca/)

Repository of Antibiotic resistance Cassetes (RAC)(http://rac.aihi.mq.edu.au/rac/)

Integrall :The integron database (http://integrall.bio.ua.pt/)

(…)

Page 40: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

“Formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts” – Wikipedia

Domain modeling: represents all the concepts involved in in microbial typing by sequence-based methods

Provides a shared vocabulary, where the concepts should be unambiguous

Enables a machine-readable format that can be used for software and algorithms automatically interact with multiple databases

Page 41: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Existing DBs reuse each others datasets without truedatabase interoperability: need for common ontologies(controlled vocabularies already exist but are not used byall)

Ontologies and computer readable data formats (json-ld or RDF) can allow for true database interoperabilityallowing bioinformaticians to extract the targetedinformation from a single query reaching multipledatabases

Page 42: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Trends Microbiol 17, 279–285 (2009).

Page 43: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

GenEpiO: Combining Different Epi, Lab,

Genomics and Clinical Data Fields.

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Clinical DataPatient demographics,

Medical History, Comorbidities, Symptoms,

Health Status

ReportingCase/Investigation Status

GenEpiO(Genomic Epidemiology Application Ontology)

See draft version at https://github.com/Public-Health-Bioinformatics/IRIDA_ontology

Original slide fromEmma Griffiths

Page 44: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Public Health Surveillance

Case Cluster Analysis

Result Reporting

Infectious Disease Epidemiology (from case to Intervention)

Lab Surveillance (from sample to strain typing results)

Evidence Collection

& Outbreak Investigation

Sample Collection& Processing

Sequence Data Generation &

Processing

Bioinformatics Analysis

Result Reporting

Whole Genome Sequencing (SO, ERO, OBI etc)

Quality Control (OBI, ERO)

Anatomy (FMA)

Environment (Envo)

Food (FoodOn)

Clinical Sampling (OBI)

Custom LIMS

Quality Control (OBI, ERO)

AMR (ARO)

Virulence (PATO)

Phylogenetic Clustering (EDAM)

Mobile Elements (MobiO)

Quality Control (OBI, ERO)

AMR (ARO) LOINC

Surveillance (SurvO)

Demographics (SIO)

Patient History (SIO)

Symptoms (SYMP)

Exposures (ExO)

Source Attribution (IDO)

Travel (IDO)

Transmission (TRANS)

Food (FoodOn)

Geography (OMRSE)

Outbreak Protocols

Surveillance (SurvO)

Food (FoodOn)

Surveillance (SurvO)

Mobile Elements (MobiO)

Infectious Disease (IDO)

Typing (TypON)

Nomenclature & Taxonomy (NCBItaxon)

Original slide from Emma Griffiths /IRIDA

htt

p:/

/fo

od

on

tolo

gy.

git

hu

b.io

/fo

od

on

/

(pipeline) NGSOnto

Page 45: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Available databases still lack interfaces forprogrammatic access : RESTful APIs would allow:▪ easy automatic querying from scripts without the need

of web interfaces or downloads

▪ Database updates by authorized groups (distributedcuration effort)

APIs : Application Programming Interfaces

Page 46: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Now we have thousands of targets for thousands of strains annotated with precious epidemiological data

Traditional phylogenetic analysis methods aren’t able to tackle the existing amount of information

Page 47: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Freely available /Open sourceJava software

Calculates:goeBURST MSTHierarchical clusteringNeighbour Joining

Can be easily applied to:- MLST/ cgMLST/wgMLST- MLVA- SNP data*- Gene Presence/absence

Page 48: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

https://online.phyloviz.net/

API: *account creation*profile + metadata upload*running goeBURST*retrieving a link

Private or Public data sharing

Scalable to thousands of nodes

Tree Analysis tools:Interactive distance matrixNLV graph Node.js / VivaGraph.js (webGL)

Screenshot by @happy_khanWith Enterobase datahttps://enterobase.warwick.ac.uk/

Page 49: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Bruno Gonçalves(He also didn’t bring the glasses…)

Page 50: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

• High Throughput Sequencing changed our views and ways to analyze bacterial populations and discriminate strains for outbreak investigation /surveillance purposes - > Genomic Epidemiology

• Bioinformatics is the key item for global genomic epidemiology. Open-source and freely-available tools provide the ability to build custom-made and verifiable pipelines.

• Real time global data sharing can speed up outbreak investigations and save lives…however some ethical /confidential issues need to be handled

• It is computationally challenging when we want to analyze and query all data produced. Most methods don’t scale well

• The future: Isolation free methods are needed: Speed up the analysis Metagenomics

Page 51: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

Algorithms

Interfaces

Ontologies

Page 52: Genomic Epidemiology:  How High Throughput Sequencing changed our view on bacterial strain similarity

UMMI Members Bruno Gonçalves Mickael Silva Catarina Inês Mendes Miguel MAchado Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

BACGENTRACK project [FCT / Scientific and Technological Research Council of Turkey (Türkiye Bilimsel ve Teknolojik AraştırmaKurumu, TÜBİTAK), TUBITAK/0004/2014]

ONEIDA project FCT Joint Activities Programme (PAC) - http://www.itqb.unl.pt/oneida

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)

INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS