Common languages in genomic epidemiology: from ontologies to algorithms

  • View
    173

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Common languages in genomic epidemiology:

from ontologies to algorithms 

João André Carriço, Mario RamirezMicrobiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbonjcarrico@fm.ul.pt twitter: @jacarrico

RAMI-NGS, Hamburg, Germany, 9-11 June 2016

Genomic epidemiology goals

Moving from Typing into High Throughput Sequencing (HTS) Genomics : Increase in discrimination Extra information to be extracted the

genome (resistance profiles, virulence factors, genome organization)

Global Outbreak detection / Surveillance

Direct application in public health Source attribution -> intervention

Need for common languages

Image credits:1) http://www.iissiidiology.net/en/publications/104-ayfaar-interpersonal-and-true-human-relationship-

harmonization-mechanisms2) http://blog.f1000research.com/2014/04/04/reproducibility-tweetchat-recap/

Data Integration

Harmonization Reproducibility

1)

Three tiers

Algorithms

Interfaces

Ontologies

SNP callingRead mapping algorithms

Bowtie2 BWA SOAP2 Saruman mr/mrsFAST …. (And a lot more )

Algorithms

Hatem M et all BMC Bioinformatics 2013..14:184DOI: 10.1186/1471-2105-14-184

+ a plethora of parameters for each of them + a (proper) choice of reference

Gene-by-geneGene-by-gene approach allele call algorithms:

BIGSdb ( Jolley, K. A. & Maiden, M. C. J. BMC Bioinf 11, 595 (2010).)

Enterobase (https://enterobase.warwick.ac.uk/)

GEP (Genome Profiler) (JCM. 2015 May;53(5):1765-7)

Ridom Seqsphere Bionumerics (Applied Maths)

Mostly assembly based (yes it is a lot of work … ) Assembly algorithms have some parameters (mostly k-mer

sizes) Lots of heuristics for allele definition..

Algorithms

Definitions needed! Gene by gene

approaches: What is a locus? What is an allele?

It depends on the algorithm(s) used!

Algorithms

However the results are largely congruent!

Ontology definitionOntologies

Image from http://www.emiliosanfilippo.it/?page_id=1172

Ontology definition “Formal representation of knowledge as a set of concepts within

a domain, and the relationships between those concepts” – Wikipedia

Domain modeling: represents all the concepts involved in in microbial typing by sequence-based methods

Provides a shared vocabulary, where the concepts should be unambiguous

Enables a machine-readable format that can be used for software and algorithms automatically interact with multiple databases

Ontologies

TypOn Ontologies

GenEpiO: Combining Different Epi, Lab, Genomics and Clinical Data Fields.

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Clinical DataPatient demographics,

Medical History, Comorbidities,

Symptoms, Health Status

ReportingCase/Investigation Status

GenEpiO

(Genomic Epidemiology Application Ontology)

See draft version at https://github.com/Public-Health-Bioinformatics/IRIDA_ontology

Original slide from Emma Griffiths

Ontologies

Public Health Surveillance

Case Cluster Analysis

Result Reporting

Infectious Disease Epidemiology (from case to Intervention)

Lab Surveillance (from sample to strain typing results)

Evidence Collection

& Outbreak Investigation

Sample Collection& Processing

Sequence Data

Generation & Processing

Bioinformatics Analysis

Result Reporting

Whole Genome Sequencing (SO, ERO, OBI etc) Quality Control (OBI, ERO)

Anatomy (FMA)

Environment (Envo)

Food (FoodOn)

Clinical Sampling (OBI)

Custom LIMS

Quality Control (OBI, ERO)

AMR (ARO)

Virulence (PATO)

Phylogenetic Clustering (EDAM)Mobile Elements (MobiO)

Quality Control (OBI, ERO)

AMR (ARO) LOINC

Surveillance (SurvO)

Demographics (SIO)Patient History (SIO)

Symptoms (SYMP)Exposures (ExO)

Source Attribution (IDO)

Travel (IDO)

Transmission (TRANS)

Food (FoodOn)Geography (OMRSE)Outbreak Protocols

Surveillance (SurvO)

Food (FoodOn)

Surveillance (SurvO)

Mobile Elements (MobiO)

Infectious Disease (IDO)

Typing (TypON)Nomenclature & Taxonomy (NCBItaxon) Original slide from Emma Griffiths /IRIDA

http://foodontology.github.io/foodon/

(pipeline) NGSOnto

Application programming interfaces (API)

Provides machine-readable web-based interface,i.e.,the algorithms (not humans) can:

retrieve, submit , update data /analysis results

launch analysis/algorithms

Interfaces

http://www.clker.com/cliparts/q/P/V/D/5/R/cog-allgrey-hi.png

Databases BIGSdb Enterobase

Offer an Restful API for data retrieving, submission and data analysis

Interfaces

Tools:microreact.orgInterfaces

Tools: PHYLOViZ Online

Interfaces

https://online.phyloviz.net/

API: *account creation*profile + metadata upload*running goeBURST*retrieving a link

Private or Public data sharing

Scalable to thousands of nodes

Tree Analysis tools:Interactive distance matrixNLV graph

Conclusions / Future WorkTransparency of analytical methods

Better definition of concepts

(Clinical/Lab/Analysis)

Better tool/databaseinteroperability

• Reproducibility of results• Added value of analysis

• Custom interfaces for non-bionf specialists

Conclusions / Future Work

Acknowledgments UMMI Members

Bruno Gonçalves Mickael Silva Miguel MAchado Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

Recommended