36
ASaiM A Galaxy framework to analyze gut microbiota data Bérénice Batut — EA CIDAM, Clermont-Ferrand November 19th, 2015

Galaxy Day Fr 2015 - ASaiM: A Galaxy framework to analyze gut microbiota data

Embed Size (px)

Citation preview

ASaiMAGalaxyframeworktoanalyzegutmicrobiotadata

BéréniceBatut—EACIDAM,Clermont-Ferrand

November19th,2015

Context

Gutmicrobiota

Communityofmicroorganismspeciesthatliveinthedigestivetracts

Importanceofgutmicrobiota"Forgotten"organ

Metagenomic/Metatranscriptomic

Comparativemeta-omicStudyofmicrobiotainitsglobality

Figure4C-Lietal,NatBiotech,2014

Comparativemeta-omicofdifferentprojectsForaglobalview?

PublicdatarepositorySourceofinformation

EuropeanNucleotideArchive"humangutmetagenome"search:

236studies

1,545runs,...

�Dispersedandnotcomparableinformation

MustCollectdatasets

Analyzethemgivenastandardworkflow

Quality control

Sequence sorting

Taxonomic analysis Functional analysis

Raw sequences

Formatted taxonomic assignations Formatted functional assignations

ExistingtoolsQIIME,MG-RAST,EBImetagenomics,MetAMOS,...

Butnoneofthemfollowsalltherequirements:

ExistingtoolsQIIME,MG-RAST,EBImetagenomics,MetAMOS,...

Butnoneofthemfollowsalltherequirements:

Analyzedatasetsgiventhestandardworkflow

ExistingtoolsQIIME,MG-RAST,EBImetagenomics,MetAMOS,...

Butnoneofthemfollowsalltherequirements:

Analyzedatasetsgiventhestandardworkflow

Usegutmicrobiotaspecificdatabases

ExistingtoolsQIIME,MG-RAST,EBImetagenomics,MetAMOS,...

Butnoneofthemfollowsalltherequirements:

Analyzedatasetsgiventhestandardworkflow

Usegutmicrobiotaspecificdatabases

Combineuser-friendlyinterfaceandcommand-line

ASaiMAuvergneSequenceanalysisofintestinalMicrobiota

Anenvironmenttoanalyzemetagenomicand

metatranscriptomicsequencesfromgutmicrobiota

ComponentsExpertdatabase

Webinterface

Framework

R1 sequences R2 sequences

COG databaseNon rRNA sequencesrRNA sequencesLong rRNA sequences

Functional assignation

Diamond

KEGG module abundance

KEGG module coverage

HUMAnN

COG family coverage

COG family abundance

KEGG pathway abundance

KEGG pathway coverage

Similarity search report

Taxonomic assignation

MetaPhlAnQIIME

De novo OTU picking

Taxonomic assignation reportof non rRNA sequences

OTU of long rRNA sequences

QIIME

Taxonomic assignation of OTU

OTU table of long rRNA sequences

QIIME

Community summary by

taxonomic composition

Taxonomy table of long rRNA sequences

QIIME

Alpha diversity and alpha

rarefaction computation

Alpha diversity of long rRNA sequences

Alpha rarefaction of long rRNA sequences

QIIME

De novo OTU picking

OTU of long rRNA sequences

QIIME

Taxonomic assignation of OTU

OTU table of long rRNA sequences

QIIME

Community summary by

taxonomic composition

Taxonomy table of long rRNA sequences

QIIME

Alpha diversity and alpha

rarefaction computation

Alpha diversity of long rRNA sequences

Alpha rarefaction of long rRNA sequences

Paired-end assembly

FastQ Joiner

Quality control

PRINSEQ

Sequence sorting

Reago SortMeRNA

Paired-end assembled sequences

Quality controlled sequences

ASaiMframeworkBioinformaticsframeworktogenerateworkflowsfor

analysesofgutmicrobiotadata

Mainrequirements

�Generationofworkflowwithnumeroustools

�Easyuse

�Flexibilityandmodularity

�Incorporationofwanted/neededtoolsanddatabases

ThingsItried

ThingsItriedSimplePythonscripts

ThingsItriedSimplePythonscripts

WorkflowmanagerssuchasLuigi,Airflow,...

ThingsItriedSimplePythonscripts

WorkflowmanagerssuchasLuigi,Airflow,...

Homemadeapproach

Configurationfile

Workflowdescription

Webinterfaceforgeneration

Pythonscriptstoexecuteworkflow

GalaxyFitmainrequirements

�Generationofworkflowwithnumeroustools

�Easyuse

�Flexibilityandmodularity

�Incorporationofwanted/neededtoolsanddatabases

ASaiMGalaxyinstance

TolaunchtheinstanceGetthecodesourcefrom

Installtherequireddependencies

Launchtheinstance

Browseiton

GitHub

$ git clone [email protected]:ASaiM/framework.git

$ cd framework$ ./src/launch_galaxy.sh

http://127.0.0.1:8080/

BehindthemagicShellscriptstoconfiguretheinstance

1. GetlatestrevisionofGalaxyfromGitHub

2. Preparedatabasesandlocaltools

3. Configurewith

Customconfigurationfiles

Wantedtools

Wanteddatabases

4. LaunchGalaxy

Tools

FromstandardGalaxyinstance

FromToolShed

Developedwrappers

Planemo

IntegrationintestToolShed

https://github.com/ASaiM/galaxytools

Workflows

DatabasesSortMeRNAribosomaldatabases

COG

RefSeq

Catalogofreferencegenesinthehumangutmicrobiomefrom

Lietal.(2014)

Greengenes

Documentation

http://asaim.readthedocs.org/

Todo

�Automatizetheconfigurationanddeploymentofthe

instancewithAnsible

�Addtoolsindevelopment,databases,workflows

�Validateworkflowsondatasets(local,mock,...)

�IntegratetoolsandworkflowstotheToolShed

�AutomatizetoolintegrationfromToolShedwithAnsible

�Completethedocumentation

ThankYou.Questions?

http://asaim.github.io

bebatut.fr

github.com/bebatut

twitter.com/bebatut