19
Common languages in genomic epidemiology: from ontologies to algorithms João André Carriço, Mario Ramirez Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon [email protected] twitter: @jacarrico AMI-NGS, Hamburg, Germany, 9-11 June 2016

Common languages in genomic epidemiology: from ontologies to algorithms

Embed Size (px)

Citation preview

Page 1: Common languages in genomic epidemiology: from ontologies to algorithms

Common languages in genomic epidemiology:

from ontologies to algorithms 

João André Carriço, Mario RamirezMicrobiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico

RAMI-NGS, Hamburg, Germany, 9-11 June 2016

Page 2: Common languages in genomic epidemiology: from ontologies to algorithms

Genomic epidemiology goals

Moving from Typing into High Throughput Sequencing (HTS) Genomics : Increase in discrimination Extra information to be extracted the

genome (resistance profiles, virulence factors, genome organization)

Global Outbreak detection / Surveillance

Direct application in public health Source attribution -> intervention

Page 3: Common languages in genomic epidemiology: from ontologies to algorithms

Need for common languages

Image credits:1) http://www.iissiidiology.net/en/publications/104-ayfaar-interpersonal-and-true-human-relationship-

harmonization-mechanisms2) http://blog.f1000research.com/2014/04/04/reproducibility-tweetchat-recap/

Data Integration

Harmonization Reproducibility

1)

Page 4: Common languages in genomic epidemiology: from ontologies to algorithms

Three tiers

Algorithms

Interfaces

Ontologies

Page 5: Common languages in genomic epidemiology: from ontologies to algorithms

SNP callingRead mapping algorithms

Bowtie2 BWA SOAP2 Saruman mr/mrsFAST …. (And a lot more )

Algorithms

Hatem M et all BMC Bioinformatics 2013..14:184DOI: 10.1186/1471-2105-14-184

+ a plethora of parameters for each of them + a (proper) choice of reference

Page 6: Common languages in genomic epidemiology: from ontologies to algorithms

Gene-by-geneGene-by-gene approach allele call algorithms:

BIGSdb ( Jolley, K. A. & Maiden, M. C. J. BMC Bioinf 11, 595 (2010).)

Enterobase (https://enterobase.warwick.ac.uk/)

GEP (Genome Profiler) (JCM. 2015 May;53(5):1765-7)

Ridom Seqsphere Bionumerics (Applied Maths)

Mostly assembly based (yes it is a lot of work … ) Assembly algorithms have some parameters (mostly k-mer

sizes) Lots of heuristics for allele definition..

Algorithms

Page 7: Common languages in genomic epidemiology: from ontologies to algorithms

Definitions needed! Gene by gene

approaches: What is a locus? What is an allele?

It depends on the algorithm(s) used!

Algorithms

However the results are largely congruent!

Page 8: Common languages in genomic epidemiology: from ontologies to algorithms

Ontology definitionOntologies

Image from http://www.emiliosanfilippo.it/?page_id=1172

Page 9: Common languages in genomic epidemiology: from ontologies to algorithms

Ontology definition “Formal representation of knowledge as a set of concepts within

a domain, and the relationships between those concepts” – Wikipedia

Domain modeling: represents all the concepts involved in in microbial typing by sequence-based methods

Provides a shared vocabulary, where the concepts should be unambiguous

Enables a machine-readable format that can be used for software and algorithms automatically interact with multiple databases

Ontologies

Page 10: Common languages in genomic epidemiology: from ontologies to algorithms

TypOn Ontologies

Page 11: Common languages in genomic epidemiology: from ontologies to algorithms

GenEpiO: Combining Different Epi, Lab, Genomics and Clinical Data Fields.

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Clinical DataPatient demographics,

Medical History, Comorbidities,

Symptoms, Health Status

ReportingCase/Investigation Status

GenEpiO

(Genomic Epidemiology Application Ontology)

See draft version at https://github.com/Public-Health-Bioinformatics/IRIDA_ontology

Original slide from Emma Griffiths

Ontologies

Page 12: Common languages in genomic epidemiology: from ontologies to algorithms

Public Health Surveillance

Case Cluster Analysis

Result Reporting

Infectious Disease Epidemiology (from case to Intervention)

Lab Surveillance (from sample to strain typing results)

Evidence Collection

& Outbreak Investigation

Sample Collection& Processing

Sequence Data

Generation & Processing

Bioinformatics Analysis

Result Reporting

Whole Genome Sequencing (SO, ERO, OBI etc) Quality Control (OBI, ERO)

Anatomy (FMA)

Environment (Envo)

Food (FoodOn)

Clinical Sampling (OBI)

Custom LIMS

Quality Control (OBI, ERO)

AMR (ARO)

Virulence (PATO)

Phylogenetic Clustering (EDAM)Mobile Elements (MobiO)

Quality Control (OBI, ERO)

AMR (ARO) LOINC

Surveillance (SurvO)

Demographics (SIO)Patient History (SIO)

Symptoms (SYMP)Exposures (ExO)

Source Attribution (IDO)

Travel (IDO)

Transmission (TRANS)

Food (FoodOn)Geography (OMRSE)Outbreak Protocols

Surveillance (SurvO)

Food (FoodOn)

Surveillance (SurvO)

Mobile Elements (MobiO)

Infectious Disease (IDO)

Typing (TypON)Nomenclature & Taxonomy (NCBItaxon) Original slide from Emma Griffiths /IRIDA

http://foodontology.github.io/foodon/

(pipeline) NGSOnto

Page 13: Common languages in genomic epidemiology: from ontologies to algorithms

Application programming interfaces (API)

Provides machine-readable web-based interface,i.e.,the algorithms (not humans) can:

retrieve, submit , update data /analysis results

launch analysis/algorithms

Interfaces

http://www.clker.com/cliparts/q/P/V/D/5/R/cog-allgrey-hi.png

Page 14: Common languages in genomic epidemiology: from ontologies to algorithms

Databases BIGSdb Enterobase

Offer an Restful API for data retrieving, submission and data analysis

Interfaces

Page 15: Common languages in genomic epidemiology: from ontologies to algorithms

Tools:microreact.orgInterfaces

Page 16: Common languages in genomic epidemiology: from ontologies to algorithms

Tools: PHYLOViZ Online

Interfaces

https://online.phyloviz.net/

API: *account creation*profile + metadata upload*running goeBURST*retrieving a link

Private or Public data sharing

Scalable to thousands of nodes

Tree Analysis tools:Interactive distance matrixNLV graph

Page 17: Common languages in genomic epidemiology: from ontologies to algorithms

Conclusions / Future WorkTransparency of analytical methods

Better definition of concepts

(Clinical/Lab/Analysis)

Better tool/databaseinteroperability

• Reproducibility of results• Added value of analysis

• Custom interfaces for non-bionf specialists

Page 18: Common languages in genomic epidemiology: from ontologies to algorithms

Conclusions / Future Work

Page 19: Common languages in genomic epidemiology: from ontologies to algorithms

Acknowledgments UMMI Members

Bruno Gonçalves Mickael Silva Miguel MAchado Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS