14
Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI) [email protected]

F02-Cloud-Cloud BioLinux

Embed Size (px)

Citation preview

Page 1: F02-Cloud-Cloud BioLinux

Cloud BioLinux: open source, fully-customizable

bioinformatics computing on the cloud for the

genomics community and beyond

BOSC 2011 - Vienna, Austria

Ntino Krampis, PhDAsst. Professor

J. Craig Venter Institute (JCVI)[email protected]

Page 2: F02-Cloud-Cloud BioLinux

Expensive sequencing and large organizations

Commodity sequencing and small labs

● large sequencing center, multi-million, broad-impact sequencing projects

● dedicated bioinformatics department, large Sun Grid Engine cluster

● small-factor, bench-top sequencer available: GS Junior by 454

● sequencing as a standard technique in basic biology and genetics research

● RNAseq and ChiPseq, and each biologist will be tackling a metagenome

Page 3: F02-Cloud-Cloud BioLinux

Will small labs become the long tail of sequencing ?

amount of sequencing

number of labs

Credit: WikiMedia Commons

Page 4: F02-Cloud-Cloud BioLinux

“Bioinformatics nation is a land of city-states” Lincoln Stein

● small labs building small-scale bioinformatics infrastructures

● duplication of effort in compiling and installing software tools

● some labs have no hardware, expertise, or time to install and run software

● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools

● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS

how about large-scale sequence datasets ?

Page 5: F02-Cloud-Cloud BioLinux

Cloud BioLinuxpre-configured and on-demand bioinformatics computing on the cloud

cloudbiolinux.org

+

=

● JCVI cloud computing research

● NEBC bioinformatics software repository

● community effort – Hackathon / BOSC 2010 - 11

● pre-configured Virtual Machine (VM, image)

● large-scale computing independently of institutional or geographic boundaries

● only need a desktop computer with internet access

Page 6: F02-Cloud-Cloud BioLinux

http://tinyurl.com/cloud-biolinux-tutorial

signup at

aws.amazon.com

then

aws.amazon.com/console

and

Cloud BioLinux

simple for end-users

Page 7: F02-Cloud-Cloud BioLinux

Amazon EC2

linux desktop

via remote

desktop client

Page 8: F02-Cloud-Cloud BioLinux

What if I want to share my

alignments with a collaborator?

save your data as a new VM

0.10$ / GB / month

at 15GB, it costs 1.5$ / month

Page 9: F02-Cloud-Cloud BioLinux

“whole system snapshot exchange” (Dudley and Butte 2010)

capture the state of the computing system and data

software execution parameters and “massaged” input datasets

Page 10: F02-Cloud-Cloud BioLinux

● customize Cloud BioLinux based on community requirements

● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)

● share customized VMs with collaborators, avoiding effort duplication

● deploy Cloud BioLinux on private and local clouds

Cloud BioLinux developer's frameworkcreate cloud VM / images with standardized software configurations

Page 11: F02-Cloud-Cloud BioLinux

● based on python-fabric auto-deployment tool

● software components listed in plain text files

● collaborators use files to share descriptions of cloud VM / images

● start with a bare-bones VM / image

● fabric downloads and installs specified software

tinyurl.com/python-fabric open.eucalyptus.com

Cloud BioLinux developer's framework

Page 12: F02-Cloud-Cloud BioLinux

software domains in bioinformatics: nextgen sequencing, de novo assembly, annotation, phylogeny,

molecular structures, gene expression analysis

github.com/chapmanb/cloudbiolinux

Page 13: F02-Cloud-Cloud BioLinux

Cloud Biolinux

The future

● expand community, receive feedback, add more software to the VM

● groups.google.com/cloudbiolinux and cloudbiolinux.org

● add data analysis pipelines that are used by sequencing centers

● actively seeking funding to put major effort in development

● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/

Page 14: F02-Cloud-Cloud BioLinux

Acknowledgments & Credits

Brad Chapman - development of the fabric scripts and community organizer

Tim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0

J. Craig Venter Inst. - time allowed to work on an open-source project

D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation

Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop

Members of the Cloud Biolinux community – precious development time

Thank you !