21
BPIPE: BIOINFORMATICS PIPELINE FRAMEWORK Speaker: Mohamed Nadhir Djekidel ( 那那那 ) 2015/11/06

BPIPE: a bioinformatics pipeline framework

Embed Size (px)

Citation preview

Page 1: BPIPE: a bioinformatics pipeline framework

BPIPE: BIOINFORMATICS PIPELINE FRAMEWORK

Speaker: Mohamed Nadhir Djekidel (那弟尔 )2015/11/06

Page 2: BPIPE: a bioinformatics pipeline framework

WHY WE NEED PIPELINES

➤ Bioinformatics analysis is generally a set steps.

➤ In some analysis we need a combination of tools (bowtie, samtools,…etc)

➤ Some tasks are repetitive (especially if we have many files).

➤ Need to edit the script if the program crush in the middle

➤ Some time we have hard coded scripts that are not portable➤ …..

Page 3: BPIPE: a bioinformatics pipeline framework

MOTIVATIONS BEHIND PIPE➤ dedicated programming language for defining and

executing bioinformatics pipelines➤ No much programmable skills are needed ➤ Simple definition of tasks➤ easy restart of the job from the point of failure➤ Easy Parallelism and job sequence management➤ Integration with Cluster Resource Managers ( GSE, PBS,

LSF)➤ Modular development of re-usable pipeline stages.➤ Automatic logging

Page 4: BPIPE: a bioinformatics pipeline framework

BPIP’S ARCHITECTURE

➤ BPIPE Language: Based on Groovy, but shell scripting in generally ok.

➤ The Bpipe Job Management Tool: BASH Shell + Java

➤ Log management : creates .bpipe folder

➤ Communication with Resource Managers: sending jobs to the queue,…etc

Page 5: BPIPE: a bioinformatics pipeline framework

BASIC BPIPE STRUCTURES

stage_one

stage_two

Page 6: BPIPE: a bioinformatics pipeline framework

CONVERT A SHELL SCRIPT TO BPIPE Original BASH script

BPIPE Script

Page 7: BPIPE: a bioinformatics pipeline framework

DYNAMIC INPUT AND OUTPUTUsed the variables $input and $output instead

Page 8: BPIPE: a bioinformatics pipeline framework

PARALLEL TASKSUse brackets {}, to specify parallel tasks

step1

step2 step3

step1

step2 step4

step3 step5

Page 9: BPIPE: a bioinformatics pipeline framework

PARALLEL TASKS -CONT

step1

step2 step4

step3 step5

step6 (Step6 will wait until both branches are finished)

Page 10: BPIPE: a bioinformatics pipeline framework

RUN ON A CLUSTER

➤ create a pipe.config file in you working directory➤ select the SGE system and specify configuration

(optional)

Page 11: BPIPE: a bioinformatics pipeline framework

PIPELINE REPORT

A file index.html will be generated in the doc folder

Page 12: BPIPE: a bioinformatics pipeline framework

INPUT SPLIT➤ Inputs can be grouped using regular expressions

➤ * used as a general selector and it affects the ordering

➤ % used for splitting

Example

Page 13: BPIPE: a bioinformatics pipeline framework

INPUT SPLIT - EXAMPLESInput

The script

Default parameters

Page 14: BPIPE: a bioinformatics pipeline framework

INPUT SPLIT - EXAMPLES

Pass individual files

Order alphabetically

Group files

Page 15: BPIPE: a bioinformatics pipeline framework

CONTROLLING OUTPUT NAMINGFilter : Keeps the same extension and adds the filter

file.csv file.nocomments.csv

Transform : changes the extension

file.csv file.xml

file.fast.gz file_fast.zip

Page 16: BPIPE: a bioinformatics pipeline framework

CONTROLLING OUTPUT NAMINGProduce : produces an output file with the specified name

Page 17: BPIPE: a bioinformatics pipeline framework

RUNNING R CODE

Page 18: BPIPE: a bioinformatics pipeline framework

SOME COMMANDS

Page 19: BPIPE: a bioinformatics pipeline framework

ADDING INFORMATION TO THE SCRIPT

Page 20: BPIPE: a bioinformatics pipeline framework

USEFUL TUTORIALS➤ Download bpipe: https://github.com/ssadedin/bpipe

➤ Documentation: http://docs.bpipe.org/

➤ A complete workshop: https://github.com/tucano/bpipe_workshop

➤ Paper : http://bioinformatics.oxfordjournals.org/content/28/11/1525.full

Page 21: BPIPE: a bioinformatics pipeline framework

THANKS