15
IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao,Ph.D. [email protected] @wlhsiao BC Centre for Disease ControlPublic Health Laboratory and University of British Columbia

IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

Embed Size (px)

Citation preview

Page 1: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

IRIDA:Canada’sfederatedplatformforgenomic

epidemiologyWilliamHsiao,Ph.D.

[email protected]@wlhsiao

BCCentreforDiseaseControlPublicHealthLaboratoryandUniversityofBritishColumbia

Page 2: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

IRIDAPlatformOverview

• IRIDA=Integrated Rapid Infectious DiseaseAnalysis

• Afree,opensource,standardscompliant,highqualitygenomicepidemiologyanalysisplatformtosupportreal-timediseaseoutbreakinvestigations

CoreFunctions:• Managementofstrainandgenomicsequencedata• Rapidprocessingandanalysisofgenomicdata• Informativedisplayofgenomicresults• Sample,Case,andaggregatedata(“metadata”)Management

Targetaudience:• Publichealthagencieswhoneedaplatformtomanageand

processgenomicdata• Publichealthagencieswhoneedaplatformtousegenomicsfor

outbreakinvestigations

IRIDA

SequencingInstruments

WebApplication

Datamanagement

Built-inAnalyticalTools

ExternalGalaxy

Command-lineTools

Page 3: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

10simplerules(wishlist)tobuildabetterpublichealthmicrobiologygenomicepidemiologyanalysissystemDownloadLatestversionathttps://github.com/phac-nml/irida

Page 4: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

1: Engage the Users Through the Entire Software Development Cycle

NationalPublic Health Agency

Provincial Public Health Agency Academic/Public

- ProjectTeamhasdirectaccesstostateoftheartresearchinacademia

- ProjectTeamisdirectlyembeddedinuserorganization

Page 5: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

2: Have A Simple User Interface

LineListView(undertesting)

TimelineView(Conceptualization)

Selectablefields

Travel

SymptomsandOnset

ExposureTypes

Hospitalization

Launchapipeline

BeLike

Page 6: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

3: Build a Robust, Extensible Platform

• IRIDAusesGalaxytomanageworkflows

• Addingadditionalpipelinesisrelativelyeasy

• UsingastandardAPItoallow3rd partytoolstoobtaindatafromIRIDA(e.g.IslandViewer andGenGIS)

IRIDA

ServletContainer

RESTAPI CentralFileStorage

WebInterface

ApplicationLogic

ComputeClusterGalaxy

$~>_ Galaxy

http://www.pathogenomics.sfu.ca/islandviewer/http://kiwi.cs.dal.ca/GenGIS/Main_Page

Page 7: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

4: Have Extensive Documentation

• Documentationshouldbeavailablefor• Users – stepbysteptutorialwithscreenshots/FAQ• SystemAdministrators– installationinstructions/issuetrackers• Developers– opensource,collaborativedevelopment/IRCChannel

• EasilyAccessibleathttps://irida.corefacility.ca/documentation/

Page 8: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

5: Implement QC Throughout the Whole Application

• Genomicsissensitiveandsequencedataareinherentlynoisy

• Genomicsisarapidlyadvancingtechnology• Standardizingpipelinesdifficultandcanstifleinnovation• Bettertostandardizetheperformanceandreportingmetricsandensureanyvalidatedpipelinesmeetthetestingcriteria

• DevelopingageneralQCtestingmodule(RCQC)thatuseontologytostandardizeQCmetrics(https://github.com/Public-Health-Bioinformatics/rcqc)

• DataProvenanceandVersionControl(data+Pipelines)aremust’sforDiagnosticLabs

Page 9: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

6: Build to Enable Collaboration

• Beabletocomparepipelines• PipelineimplementedusingGalaxy– transparentandshareable

• DefineQCcriteriausingontologytocomparethedifferentpipelinesofthesamepurpose

• Beabletosharedatainstandardformatstominimizedatare-entryfromoneplatformtoanother

• FederationofplatformsusingstandardAPItosharedataandanalysisresults

Page 10: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

7: Use Compatible Data Standards

• Sequencedataaremorecompatible/shareablebutmetadataarecurrentlyinsiloandincompatible

• CollaborationandSharingaredifficultwhendataareincompatible

• Compatibility!=Sameness

• UseOntologytoallowcustomizationoftermlistbutalltermswithsamemeaning(semantics)shouldhavethesameuniversalID(e.g.anURL)tofacilitatemappingofterms

Page 11: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

8: Implement Fine Grained Access Control

DetailedView RestrictedView

E.g.Userrolepermissions controlvisibilityandeditingofcontent

Authorization

• Industry-standardauthenticationandauthorizationmechanisms

• Localauthorizationperinstance.

• Method-levelauthorization.• Object-levelauthorization.

Page 12: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

9: Use Technology to Safeguard Patient Privacy

It’seasytolosecontroloftheExcelLineList-someonecanmakeacopyofthecontentandpassitaroundwithoutyourknowledge;typosarecommonandcumulative!

Technologycancontrolwhoseeswhatandwhen

Separateoutsensitivepatientdatafrompathogensequencedatabutbeabletobringthemtogetherwhennecessarywithoutresortingtoemailingoflinelists!

Page 13: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

10: Have Multiple, Flexible Access Options

• Noonesizefitsallsolution;Havingmanyplatformstochoosefromisagoodthing(butdatashouldbeportableacrossplatforms!)

• IRIDAisavailableinseveraldifferentflavours:LocalInstall VirtualMachine CloudInstance PublicVersion

Advantages Fullcontrolofthesystem; yourdataneverleaveyourcentre

Fullcontrolofthesystem;Easytosetup

Fullcontrolofthesystem;doesnotrequirelocalcomputinginfrastructure

Nosetuprequired,uploadyourdataandhaveitprocessedusingComputeCanadaResource

Disadvantages Computinginfrastructure andITsupportneeded tomaintheresource

Not reallyscalableifrunonyourowndesktop;someperformance loss

Datago intoacloudenvironment;uploading tocloudenvironmentcanbeslow

Datagointoapublicinstance(dataremainprivatetoyouraccount);uploadcanbeslow

Page 14: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

AcknowledgementsProjectLeadersFionaBrinkman– SFUWillHsiao– PHMRLGaryVanDomselaar – NML

UniversityofLisbonJoᾶoCarriҫo

NationalMicrobiology Laboratory (NML)FranklinBristowAaronPetkauThomasMatthewsJoshAdamAdamOlsonTarah LynchShaunTylerPhilipMabonPhilipAuCelineNadonMatthewStuart-EdwardsMoragGrahamChrystalBerryLorelee TschetterAleisha Reimer

Laboratory forFoodborne Zoonoses (LFZ)EduardoTaboadaPeterKruczkiewiczChadLaingVicGannonMatthewWhitesideRossDuncanStevenMutschall

SimonFraserUniversity(SFU)MelanieCourtotEmmaGriffithsGeoffWinsorJulieShayMatthewLairdBhavDhillonRaymondLo

BCPublicHealthMicrobiology &ReferenceLaboratory (PHMRL)andBCCentre forDiseaseControl (BCCDC)Judy Isaac-RentonPatrickTangNataliePrystajeckyJenniferGardyDamion DooleyLindaHoangKimMacDonaldYinChangEleni GalanisMarshaTaylorCletusD’SouzaAnaPaccagnella

UniversityofMarylandLynnSchriml

CanadianFood Inspection Agency(CFIA)BurtonBlaisCatherineCarrilloDominicLambert

DalhousieUniversityRobBeikoAlexKeddy

14

McMasterUniversityAndrewMcArthurDaim Sardar

European NucleotideArchiveGuyCochranePetratenHoopenClaraAmid

European FoodSafetyAgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

Page 15: IMMEM XI: Ten Simple Rules to Build a Better Public Health Genomic Epidemiology Analysis Platform

1515

IRIDAAnnualGeneralMeetingWinnipeg,April8-9,2015