Upload
mohamed-nadhir-djekidel
View
344
Download
0
Embed Size (px)
Citation preview
BPIPE: BIOINFORMATICS PIPELINE FRAMEWORK
Speaker: Mohamed Nadhir Djekidel (那弟尔 )2015/11/06
WHY WE NEED PIPELINES
➤ Bioinformatics analysis is generally a set steps.
➤ In some analysis we need a combination of tools (bowtie, samtools,…etc)
➤ Some tasks are repetitive (especially if we have many files).
➤ Need to edit the script if the program crush in the middle
➤ Some time we have hard coded scripts that are not portable➤ …..
MOTIVATIONS BEHIND PIPE➤ dedicated programming language for defining and
executing bioinformatics pipelines➤ No much programmable skills are needed ➤ Simple definition of tasks➤ easy restart of the job from the point of failure➤ Easy Parallelism and job sequence management➤ Integration with Cluster Resource Managers ( GSE, PBS,
LSF)➤ Modular development of re-usable pipeline stages.➤ Automatic logging
BPIP’S ARCHITECTURE
➤ BPIPE Language: Based on Groovy, but shell scripting in generally ok.
➤ The Bpipe Job Management Tool: BASH Shell + Java
➤ Log management : creates .bpipe folder
➤ Communication with Resource Managers: sending jobs to the queue,…etc
BASIC BPIPE STRUCTURES
stage_one
stage_two
CONVERT A SHELL SCRIPT TO BPIPE Original BASH script
BPIPE Script
DYNAMIC INPUT AND OUTPUTUsed the variables $input and $output instead
PARALLEL TASKSUse brackets {}, to specify parallel tasks
step1
step2 step3
step1
step2 step4
step3 step5
PARALLEL TASKS -CONT
step1
step2 step4
step3 step5
step6 (Step6 will wait until both branches are finished)
RUN ON A CLUSTER
➤ create a pipe.config file in you working directory➤ select the SGE system and specify configuration
(optional)
PIPELINE REPORT
A file index.html will be generated in the doc folder
INPUT SPLIT➤ Inputs can be grouped using regular expressions
➤ * used as a general selector and it affects the ordering
➤ % used for splitting
Example
INPUT SPLIT - EXAMPLESInput
The script
Default parameters
INPUT SPLIT - EXAMPLES
Pass individual files
Order alphabetically
Group files
CONTROLLING OUTPUT NAMINGFilter : Keeps the same extension and adds the filter
file.csv file.nocomments.csv
Transform : changes the extension
file.csv file.xml
file.fast.gz file_fast.zip
CONTROLLING OUTPUT NAMINGProduce : produces an output file with the specified name
RUNNING R CODE
SOME COMMANDS
ADDING INFORMATION TO THE SCRIPT
USEFUL TUTORIALS➤ Download bpipe: https://github.com/ssadedin/bpipe
➤ Documentation: http://docs.bpipe.org/
➤ A complete workshop: https://github.com/tucano/bpipe_workshop
➤ Paper : http://bioinformatics.oxfordjournals.org/content/28/11/1525.full
THANKS