View
658
Download
1
Category
Tags:
Preview:
Citation preview
Cloud BioLinux: open source, fully-customizable
bioinformatics computing on the cloud for the
genomics community and beyond
BOSC 2011 - Vienna, Austria
Ntino Krampis, PhDAsst. Professor
J. Craig Venter Institute (JCVI)agbiotec@gmail.com
Expensive sequencing and large organizations
Commodity sequencing and small labs
● large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, large Sun Grid Engine cluster
● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
Will small labs become the long tail of sequencing ?
amount of sequencing
number of labs
Credit: WikiMedia Commons
“Bioinformatics nation is a land of city-states” Lincoln Stein
● small labs building small-scale bioinformatics infrastructures
● duplication of effort in compiling and installing software tools
● some labs have no hardware, expertise, or time to install and run software
● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools
● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS
how about large-scale sequence datasets ?
Cloud BioLinuxpre-configured and on-demand bioinformatics computing on the cloud
cloudbiolinux.org
+
=
● JCVI cloud computing research
● NEBC bioinformatics software repository
● community effort – Hackathon / BOSC 2010 - 11
● pre-configured Virtual Machine (VM, image)
● large-scale computing independently of institutional or geographic boundaries
● only need a desktop computer with internet access
http://tinyurl.com/cloud-biolinux-tutorial
signup at
aws.amazon.com
then
aws.amazon.com/console
and
Cloud BioLinux
simple for end-users
Amazon EC2
→
linux desktop
via remote
desktop client
What if I want to share my
alignments with a collaborator?
save your data as a new VM
0.10$ / GB / month
at 15GB, it costs 1.5$ / month
“whole system snapshot exchange” (Dudley and Butte 2010)
capture the state of the computing system and data
software execution parameters and “massaged” input datasets
● customize Cloud BioLinux based on community requirements
● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)
● share customized VMs with collaborators, avoiding effort duplication
● deploy Cloud BioLinux on private and local clouds
Cloud BioLinux developer's frameworkcreate cloud VM / images with standardized software configurations
● based on python-fabric auto-deployment tool
● software components listed in plain text files
● collaborators use files to share descriptions of cloud VM / images
● start with a bare-bones VM / image
● fabric downloads and installs specified software
tinyurl.com/python-fabric open.eucalyptus.com
Cloud BioLinux developer's framework
software domains in bioinformatics: nextgen sequencing, de novo assembly, annotation, phylogeny,
molecular structures, gene expression analysis
github.com/chapmanb/cloudbiolinux
Cloud Biolinux
The future
● expand community, receive feedback, add more software to the VM
● groups.google.com/cloudbiolinux and cloudbiolinux.org
● add data analysis pipelines that are used by sequencing centers
● actively seeking funding to put major effort in development
● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/
●
Acknowledgments & Credits
Brad Chapman - development of the fabric scripts and community organizer
Tim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0
J. Craig Venter Inst. - time allowed to work on an open-source project
D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation
Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop
Members of the Cloud Biolinux community – precious development time
Thank you !
Recommended