View
326
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic analysis remains difficult to achieve. Mature complex workflows often require small tweaks to accommodate the idiosyncracies of new datasets, but integrating the required new capabilities into the framework is prohibitively complex and expensive. As a result, when problems are encountered in an existing pipeline, data may be temporarily diverted for manual processing outside the framework. These manual steps typically involve relatively trivial, transient, undocumented and poorly curated programs or scripts - "dark script matter" that rarely reaches appropriate local version control or archiving systems where production code is maintained, threatening the goal of reproducible analysis. The Galaxy Toolfactory is a Galaxy tool that allows scripts (R, perl, python, Bash...) to be run directly and repeatably through the normal Galaxy interface. The Toolfactory optionally generates all the biolerplate code needed for a new Galaxy tool that permanently wraps the script for reuse. Newly generated tools can be uploaded to a local or remote Galaxy Toolshed. Tools can be installed in a running Galaxy server from any Toolshed through the administrative interface for subsequent use in worflows and analyses. The conversion of a trivial script into a working, shareable Galaxy tool will be demonstrated during the presentation.
Citation preview
1
Bioinformatic Alchemy 101
Transmuting dark script
matter into reusable tools
Ross Lazarus
BakerIDI
2
Context: bioinformatic analyses
Big data; complex analyses
Repeatable, automated pipelines
Reproducibility real goal
Reproducibility is hard
3
Frameworks
Eg VGL
Local SOPs for biologists
Tools, canned workflows
Minimise opportunities for error
Maximise reproducibilty
4
In real life
90/10 rule
Need to tweak SOPs
Trivial 'disposable' scripts
Not documented or curated
Not reliably available to re-run
“Dark script matter”
5
Dark Script Matter
Outside usual VCS/pipelines
Manual =/= reproducible
Necessary evil?
Platform extensions complex
Eg Galaxy – hours of work
6
Plan
Context: Reproducible analyses
Frameworks vs Dark Scripts
Alchemy: script to Galaxy
tool Demonstration
Summary
Conclusions
7
Galaxy Tool Factory
An installable Galaxy tool
Runs scripts: Python,R,Perl,sh
Generates new Galaxy tools
Tool code wraps the script
Minutes – not hours
8
Galaxy Tool Shed
Separate server
Stores/serves Galaxy tools
Admin can install to Galaxy
Mercurial VCS archives
Explicit tool versioning
Sharing and reproducibility
Demo 1: Install the Tool Factory
Demo 2: Create a new tool
11
Prepare script
Python; R; Perl; Sh
Parse CL params – 1=in, 2=out
Typically workflow transformations
Arbitrary complexity
Simple example
Write transpose of a tabular file
12
Prepare/upload test data
SMALL sample input
Becomes functional test case
h1 h2 h3 h4
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
13
# R transpose a tabular input file and write as
# a tabular output file
ourargs = commandArgs(TRUE)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=FALSE, sep="\t",row.names=F,col.names=FALSE)
14
Demo part 1
As an admin, test run the code
Can't make a new tool until it works!
Admin only real time scripting in Galaxy.
Overrides ALL other security.
Generated tools run with normal security.
15
Use Redo button; Generate
When working right
Use Redo to save retyping
Select Generate option
Provide tool ID, help text
Execute
Expect a toolfactory.gz in history
Copy link (floppy disk icon)
16
What's in the toolshed.gz ?
A gzip'd mercurial tool repository (!)
Auto generated tool XML file
Auto generated tool python wrapper
Functional test case - the sample data
Familiar Galaxy tool for all users
Executes your script over their data
Interoperably inside Galaxy
17
Upload TS gzip to new repository
Upload to any tool shed
Create new repo; sensible name!
Choose Upload files to new repo
Paste URL (floppydisk save icon)
New tool ready to install
18
Install and Test New Tool
Back to Galaxy admin interface
Browse local tool shed
Choose new tool
Install to local Galaxy
Try it out
Run functional test
19
Summary
GTF = script to tool in minutes
Integrated with Galaxy and TS
Simple workflow components
If needed, generate simple tool
Then add parameters manually
20
Tool Factory Operation Guide
Script
(Python,R,
perl, sh)
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;
21
GALAXY
http://usegalaxy.org
22
Generate a new Galaxy tool
Galaxy Tool Factory
From a python, R, Perl or bash script
# transpose a tabular input file and write as a tabular output file
ourargs = commandArgs(T)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=F, sep="\t",row.names=F,col.names=F)
Using a Galaxy tool
Via a Tool Shed
23
Tool Factory Operation Guide
Script – R,
perl, python
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;