39
Tool Dependencies and Containers

Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 2: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

WhataretheadvantagesofrunningmyGalaxytoolinsideofacontainer?

HowdoesGalaxyfindacontainertorunmytoolin?

WhatareBioContainersandhowaretheyrelatedtoGalaxy?

Questions

Page 3: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ExplorethedifferencesbetweencontainerizingGalaxyandtoolexecution.

Discusstheadvantagesofcontainerizingtools.

Learntobuildbestpracticetoolsreadytobecontainerized.

Objectives

Page 5: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

GenericContainersareGoodSlide

IsolationandSecurity*ReproducibilityFlexibility*

*theindustryisgettingthere

Page 7: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ContainerizingGalaxyvsToolsWearegoingtodiscusscontainerizingtoolexecutioninstead-executingjust

jobsincontainers.

Containersfortheparticularjob'stool.

HoweveryoudeployGalaxy,includinginacontainer,toolexecutioncanstillbecontainerized.

Page 8: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Isolatedtoolexecution.Isolatefilesystemaccess.Addedlayerofsecurity.Increasedre-computability.Newdeploymentoptions-Kubernetes,MesosChronos,AWSBatch,etc.

ContainerizingToolsisStillImportant

Page 9: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Decomposestobasicproblems:

InstructGalaxywheretofindacontainerforthetool.InstructGalaxytorunthetoolinacontainer.

ContainerizingToolExecution

Page 10: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Justconfigurethedestination.Forinstance,transformtheclusterdestination:

<destinationid="short_fast"runner="slurm"><paramid="nativeSpecification">--time=00:05:00--nodes=1</destination>

asfollows:

<destinationid="short_fast"runner="slurm"><paramid="nativeSpecification">--time=00:05:00--nodes=1<paramid="docker_enabled">true<paramid="docker_sudo">false</destination>

Thatisit!

Fordevelopment,thePlanemoflag--dockerdoesthisfortest,serve,andrelatedcommands.

ConfiguringGalaxytoUseContainers

Page 11: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Decomposestobasicproblems

InstructGalaxywheretofindacontainerforthetool.InstructGalaxytorunthetoolinacontainer.

ContainerizingToolExecution

Page 12: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ExplicitContainerDependenciesReturningtotheseqtkexample,letschangetherequirementsfrom:

<requirements><requirementtype="package"version="1.2">seqtk</requirement></requirements>

into

<requirements><containertype="docker">quay.io/biocontainers/seqtk:1.2--1</container><requirementtype="package"version="1.2">seqtk</requirement></requirements>

NowrunPlanemotestandservewiththe--dockerflagandasatooldeveloperyouaredone!

Page 13: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Decomposestobasicproblems

InstructGalaxywheretofindacontainerforthetool.InstructGalaxytorunthetoolinacontainer.

We'redone...right?

ContainerizingToolExecution

Page 14: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

TheProblemswithMakingDockerExplicitSettingupaDockerfileandpublishingaDockerimagemoreprocessforthetooleventhoughthedependencieshavealreadybeencompletelydefined.AnarbitraryDockerimageisablackboxandthereisnoguaranteeGalaxywillexecutethesamebinariesastheCondarequirements.

Page 15: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ToPutitAnotherWayThis:

<requirements><requirementtype="package"version="1.2">seqtk</requirement></requirements>

shouldhavebeensufficient!

Andthegoodnewsisthatnowitis(mostly)!

Page 16: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Galaxycannowautomaticallyfindorbuildcontainersforbestpracticetools.

Planemowillcheckifsuchacontainerhasbeenpublishedwiththe--biocontainersflagtoplanemolint.

$planemolint--biocontainersseqtk_seq.xml...Applyinglinterbiocontainer_registered...CHECK..INFO:BioContainerbest-practicecontainerfound[quay.io/biocontainers/seqtk:1.2--1].

BioContainers-TheMagic

Page 17: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

TheMysteryquay.io/biocontainers/seqtkContainer

Page 18: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

RunPlanemotestorservewith--biocontainerstotrymysterycontainer.

$planemotest--biocontainersseqtk_seq.xml...[galaxy.tools.actions]SetupforjobJob[unflushed,tool_id=seqtk_seq]complete,readytoflush(20.380ms)[galaxy.tools.actions]FlushedtransactionforjobJob[id=2,tool_id=seqtk_seq](15.191ms)...[galaxy.tools.deps.containers]Checkingwithcontainerresolver[ExplicitDockerContainerResolver[]]founddescription[None][galaxy.tools.deps.containers]Checkingwithcontainerresolver[CachedMulledDockerContainerResolver[namespace=None]]founddescription[None][galaxy.tools.deps.containers]Checkingwithcontainerresolver[MulledDockerContainerResolver[namespace=biocontainers]]founddescription[ContainerDescription[identifier=quay.io/biocontainers/seqtk:1.2--1,type=docker]][galaxy.jobs.command_factory]Builtscript[/tmp/tmpw8_UQm/job_working_directory/000/2/tool_script.sh]fortoolcommand[seqtkseq-a'/tmp/tmpw8_UQm/files/000/dataset_1.dat'>'/tmp/tmpw8_UQm/files/000/dataset_2.dat']...ok

----------------------------------------------------------------------XML:/private/tmp/tmpw8_UQm/xunit.xml----------------------------------------------------------------------Ran1testin11.926sOK

Theimportantlinehereis-Checkingwithcontainerresolver[MulledDockerContainerResolver[namespace=biocontainers]]founddescription[ContainerDescription[identifier=quay.io/biocontainers/seqtk:1.2--1,type=docker]].

BioContainers-UsingtheContainer

Page 20: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

AllBiocondapackagesarebuiltintominimalcontainers.

Thissetupallowsthesamebinariestobeusedwithinthecontainerorontraditional/HPCresources.Withoutanyextraworkbytoolauthors,Galaxy

canautomaticallyfindorbuild“thecorrect”containerforabest-practicetool.

Over4,000containersalreadypublished.

BioContainers-BiocondaforContainers

Page 22: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Galaxycanbeconfiguredtoattempttobuildcontainersondemandforcontainersthathaven'tbeenpublishedtotheBioContainersnamespaceon

quay.io.

Thesecontainerswillhavesamenamesaswouldbepublishedtoquay.io(e.g.quay.io/biocontainers/seqtk:1.2--0).

BioContainers-OnDemandCreation

Page 23: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Decomposestobasicproblems

InstructGalaxywheretofindacontainerforthetool.InstructGalaxytorunthetoolinacontainer.

We'redonethistimenowright?

ContainerizingToolExecution

Page 24: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

RevisitingGalaxyRequirements

Manytoolshavemultiplerequirements,theseneedcontainersalso!

Page 25: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

AnExampleTheBWAtoolisanexampleofonesuchtoolthathasmultiplerequirements-

becausesamtoolsisusedtosortBAMsaftermapping.ThePlanemoconda_testingprojectisdistributedwithasimpletooltosimulatethis

containingbothbwaandsamtoolsrequirements.

$planemoproject_init--template=conda_testingconda_testing$cdconda_testing/$grep-rrequirebwa_and_samtools.xmlbwa_and_samtools.xml:<requirements>bwa_and_samtools.xml:<requirementtype="package"version="0.7.15">bwa</requirement>bwa_and_samtools.xml:<requirementtype="package"version="1.3.1">samtools</requirement>bwa_and_samtools.xml:</requirements>

Page 26: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ContainerHashingGalaxyfindscontainersbasedonthenamesandversionsofrequirements,so

farwehaveseenthehashforsinglerequirementsisjustthenameandtheversion.

Ifmultiplerequirementsarepresent,[email protected]@1.3.1,Galaxywilllookforacontainercalled:

quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0

Page 27: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

GalaxyTerminology-MulledTomullistocreateanenvironment(eitherintheCondasenseorgloballyinsideacontainer)foroneormoreCondapackages.Theresultofthisisa

mulledenvironment.

Fixednamingschemesfortheresultingenvironmentsensurethatdifferenttoolswiththesamesetofrequirementscanreuseapreviouslycreated

environmentorcontainer.

Ifmorethanonepackageisincludedintheresultingenvironment,acomplicatedhashisusedasthisname.

Page 28: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

MulledHashingBacktotheexampleofbwaatversion0.7.15andsamtoolsatversion1.3.1,we

saidtheresultingcontainerwillbe

quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0

Thiscanbebrokenintotheseparts:

quay.io/<namespace>/mulled-v2-<package_hash>:<version_hash>-<build>

Page 29: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ExploringMulledHashes-byEvgenyAnatskiy

Page 30: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

ThePlanemomullcommand(1/2)WhileGalaxycanbeconfiguredtoauto-buildBioContainersastheyareneeded,thePlanemomullcommandcanbeusedtomanuallybuildthemforyourlocalDockerhost.

Page 31: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Themullcommand(2/2)$planemomullbwa_and_samtools.xml/Users/john/.planemo/involucro-v=3-f/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua-setCHANNELS='iuc,bioconda,r,defaults,conda-forge'-setTEST='true'-setTARGETS='samtools=1.3.1,bwa=0.7.15'-setREPO='quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820'-setBINDS='build/dist:/usr/local/'-setPREINSTALL='condainstall--quiet--yesconda=4.3'build/Users/john/.planemo/involucro-v=3-f/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua-setCHANNELS='iuc,bioconda,r,defaults,conda-forge'-setTEST='true'-setTARGETS='samtools=1.3.1,bwa=0.7.15'-setREPO='quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820'-setBINDS='build/dist:/usr/local/'-setPREINSTALL='condainstall--quiet--yesconda=4.3'buildDEBURunfile[/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua]STEPRunimage[continuumio/miniconda:latest]withcommand[[rm-rf/data/dist]]DEBUCreatingcontainer[step-730a02d79e]DEBUCreatedcontainer[5e4b5f83c455step-730a02d79e],startingitDEBUContainer[5e4b5f83c455step-730a02d79e]started,waitingforcompletionDEBUContainer[5e4b5f83c455step-730a02d79e]completedwithexitcode[0]asexpectedDEBUContainer[5e4b5f83c455step-730a02d79e]removedSTEPRunimage[continuumio/miniconda:latest]withcommand[[/bin/sh-ccondainstall--quiet--yesconda=4.3&&condainstall-ciuc-cbioconda-cr-cdefaults-cconda-forgesamtools=1.3.1bwa=0.7.15-p/usr/local--copy--yes--quiet]]DEBUCreatingcontainer[step-e95bf001c8]DEBUCreatedcontainer[72b9ca0e56f8step-e95bf001c8],startingitDEBUContainer[72b9ca0e56f8step-e95bf001c8]started,waitingforcompletionSOUTFetchingpackagemetadata.........SOUTSolvingpackagespecifications:.SOUTSOUTPackageplanforinstallationinenvironment/opt/conda:SOUTSOUTThefollowingpackageswillbeUPDATED:SOUTSOUTconda:4.3.11-py27_0-->4.3.22-py27_0SOUTSOUTFetchingpackagemetadata.................SOUTSolvingpackagespecifications:....SOUTDEBUContainer[72b9ca0e56f8step-e95bf001c8]completedwithexitcode[0]asexpectedDEBUContainer[72b9ca0e56f8step-e95bf001c8]removedSTEPWrap[build/dist]as[quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0]DEBUCreatingcontainer[step-6f1c176372]DEBUPackingsucceeded

Thisbuiltquay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0!

Page 32: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Testinglocallymulledcontainers$planemotest--galaxy_branchdev--biocontainersbwa_and_samtools.xml...[galaxy.tools.actions]Handledoutputnamedoutput_2fortoolbwa_and_samtools(17.443ms)[galaxy.tools.actions]Addedoutputdatasetstohistory(12.935ms)[galaxy.tools.actions]VerifiedaccesstodatasetsforJob[unflushed,tool_id=bwa_and_samtools](0.021ms)[galaxy.tools.actions]SetupforjobJob[unflushed,tool_id=bwa_and_samtools]complete,readytoflush(5.755ms)[galaxy.tools.actions]FlushedtransactionforjobJob[id=1,tool_id=bwa_and_samtools](19.582ms)[galaxy.jobs.handler](1)Jobdispatched[galaxy.tools.deps]Usingdependencybwaversion0.7.15oftypeconda[galaxy.tools.deps]Usingdependencysamtoolsversion1.3.1oftypeconda[galaxy.tools.deps]Usingdependencybwaversion0.7.15oftypeconda[galaxy.tools.deps]Usingdependencysamtoolsversion1.3.1oftypeconda[galaxy.tools.deps.containers]Checkingwithcontainerresolver[ExplicitContainerResolver[]]founddescription[None][galaxy.tools.deps.containers]Checkingwithcontainerresolver[CachedMulledContainerResolver[namespace=None]]founddescription[ContainerDescription[identifier=quay.io/biocontainers/mulled-v1-01afc412d1f216348d85970ce5f88c984aa443f3:latest,type=docker]][galaxy.jobs.command_factory]Builtscript[/tmp/tmpQs0gyp/job_working_directory/000/1/tool_script.sh]fortoolcommand[bwa>/tmp/tmpQs0gyp/files/000/dataset_1.dat2>&1;samtools>/tmp/tmpQs0gyp/files/000/dataset_2.dat2>&1][galaxy.tools.deps]UsingdependencysamtoolsversionNoneoftypeconda[galaxy.tools.deps]UsingdependencysamtoolsversionNoneoftypecondaok

----------------------------------------------------------------------XML:/private/tmp/tmpQs0gyp/xunit.xml----------------------------------------------------------------------Ran1testin7.553s

OK[test_driver]Shuttingdown[test_driver]Shuttingdownembeddedgalaxywebserver[test_driver]Embeddedwebservergalaxystopped[test_driver]Stoppingapplicationgalaxy...[galaxy.jobs.handler]jobhandlerstopqueuestoppedTestingcomplete.HTMLreportisin"/home/planemo/workspace/planemo/tool_test_output.html".All1test(s)executedpassed.bwa_and_samtools[0]:passed

Page 34: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

TheGoal

Use--biocontainerstobuildacontaineron-demandforatesttool.

Hands-on

Page 35: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Steps

Runthroughthebwa_and_samtools.xmltesttoolandverifycontainercreation.

$planemoproject_init--project_template=conda_testingconda_testing$cdconda_testing/$planemomullbwa_and_samtools.xml$dockerimages#verifythecontainerwasbuilt$#usedockerruntoverifythecontainerhassamtoolsandbwa.$planemotest--biocontainersbwa_and_samtools.xml

Hands-on

Page 36: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

PublishingMulti-PackageContainers

Page 37: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

PublishingYourMulti-PackageContainers

Page 38: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Itisbecomingeasier,moreadvantageous,andmorecommonforGalaxyadminstorunalltoolswithintheirowncontainer.

Youcanexplicitlydefineacontainerforyourtool-butitiseasierandmorereproducibletoletGalaxyfindorbuildoneusingyourtool'sbestpracticerequirements.

TheGalaxycommunitywillinfrastructuretoautomaticallybuildand/orpublishcontainersforyourtoolaslongasitdefinesbestpracticeCondadependencies.

Planemomakesiteasytotestyourtoolinsideofcontainers.

Keypoints

Page 39: Tool Dependencies and Containers · Setting up a Dockerfile and publishing a Docker image more process for the tool even though the dependencies have already been completely defined

Thankyou!Thismaterialistheresultofacollaborativework.ThankstheGalaxyTraining

Networkandallthecontributors(JohnChilton,BjörnGrüning)!

Foundatypo?Somethingiswronginthistutorial?EdititonGitHub