30
Cloud BioLinux: pre-Configured and on-demand computing for genomics independently of institutional, geographic or economic boundaries Ntino Krampis, PhD JCVI-NIAID workshop 2011 S. Africa

Cloud ntino-krampis

Embed Size (px)

Citation preview

Page 1: Cloud ntino-krampis

Cloud BioLinux: pre-Configured and on-demand

computing for genomics independently of institutional,

geographic or economic boundaries

Ntino Krampis, PhD

JCVI-NIAID workshop 2011S. Africa

Page 2: Cloud ntino-krampis

Expensive sequencing and large organizations

Commodity sequencing and small labs

● large sequencing center, multi-million, broad-impact sequencing projects

● dedicated bioinformatics department, coordination with other centers

● small-factor, bench-top sequencer available: GS Junior by 454

● sequencing as a standard technique in basic biology and genetics research

● RNAseq and ChiPseq, and each biologist will be tackling a metagenome

Page 3: Cloud ntino-krampis

● downstream bioinformatics analysis for scientific discovery

● many commonly-used bioinformatics tools are difficult to install

● usually available only as source code - needs technical expertise

● large-scale sequence data analysis requires high performance and expensive computing hardware

Acquiring the sequence data is only the first step

Page 4: Cloud ntino-krampis

● Cloud Computing: large-scale, high performance computers accessible through the Internet

● Example: using Gmail, Google Docs, Yahoo! Mail, FaceBook etc. you store and access data on a remote computer

● Cloud Computing services - Amazon EC2 (http://aws.amazon.com/ec2) rent high computational and data storage capacity on remote computers

Alternative: computational capacity on the cloud

Page 5: Cloud ntino-krampis

operating system, bioinformatics software and data, are installed in a Virtual Machine (VM)

a VM is uploaded and executed on a cloud computing service

run a practically unlimited number of VMs for large-scale sequence data analysis

access VM on a desktop computer through the Internet

How does Cloud Computing work ?

local desktop computers

Internet

remote Amazon EC2 Cloud Computing service

VM VM VM

Page 6: Cloud ntino-krampis

● Cloud BioLinux by leverages VM technology and the cloud, offering pre-configured bioinformatics computing

● allow setting up a high-performance data analysis environment, without any technical expertise

● researchers can perform large-scale data analysis, by simply using a desktop computer with Internet access

● accessible without any institutional, economic or national boundaries

Cloud BioLinux

Page 7: Cloud ntino-krampis

1. sign up for an Amazon EC2 cloud account:

http://aws.amazon.com/ec2 Also can connect an existing account from the main Amazon.com website for the cloud usage charges. We have an account ready for you: Username: [email protected] Password: Nhg4|CL0ud!

2. using the account credentials sign in to the EC2 cloud console (select EC2 in the dropdown menu below the sign-in button):

http://aws.amazon.com/console

3. launch Cloud BioLinux through the cloud console wizard

Launching Cloud BioLinux

Page 8: Cloud ntino-krampis

Launching Cloud BioLinux

http://aws.amazon.com/console

Click the button :

Page 9: Cloud ntino-krampis

1. specify the Cloud BioLinux identifier under “Community

AMIs” tab

2. computational capacity: memory,

processor, CPU cores

Launch instance wizard: steps 1 & 2

Page 10: Cloud ntino-krampis

3. specify a password for login for the Cloud BioLinux desktop, under “User

Data” box

4. remaining steps: all as default, keep

clicking the “Continue” button

until the wizard finishes and you are back to the console

Launch instance wizard: step 3

Page 11: Cloud ntino-krampis

Launching Cloud

BioLinux

back to the console after we completed

the wizard

Pick a running instance, select

with your mouse and

copy its “Public DNS” address

(Cloud BioLinux

server address on the cloud)

Page 12: Cloud ntino-krampis

While waiting for Cloud BioLinux to boot up...

● examples of NCBI public datasets on EC2

● bringing the data to the compute

Page 13: Cloud ntino-krampis

Final step: connecting remotely to Cloud BioLinux

click the NX client icon on your computer's desktop

A. paste the DNS in the “Host” box B. select “Unix”, “Gnome”, remote desktop size

C. “ubuntu” is the default user Login “workshop” is the password we set

Page 14: Cloud ntino-krampis
Page 15: Cloud ntino-krampis
Page 16: Cloud ntino-krampis
Page 17: Cloud ntino-krampis
Page 18: Cloud ntino-krampis
Page 19: Cloud ntino-krampis
Page 20: Cloud ntino-krampis
Page 21: Cloud ntino-krampis
Page 22: Cloud ntino-krampis
Page 23: Cloud ntino-krampis
Page 24: Cloud ntino-krampis
Page 25: Cloud ntino-krampis
Page 26: Cloud ntino-krampis
Page 27: Cloud ntino-krampis

What if I want to share my

alignments with a collaborator?

save your data as a new VM

0.10$ / GB / month

at 15GB, it costs 1.5$ / month

Page 28: Cloud ntino-krampis

share your analysis results: publicly or only with your collaborators

authorized users can access the cloud VM/image with all the software, data, analysis results

Cloud BioLinux

whole system snapshot exchange

Page 29: Cloud ntino-krampis

start VM / image

perform analysis

snapshot

share

share

snapshot

perform analysis

start VM / image

researcher A researcher B

Cloud BioLinux and Genomic Standards

whole system snapshot exchange

Page 30: Cloud ntino-krampis

Acknowledgments & Credits

Brad Chapman - development of the fabric scripts and community organizer

Tim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentation

Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop

Justin Johnson – community and sponsorship of cloudbiolinux.com

J. Craig Venter Inst. - time allowed to work on an open-source project

D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation

Members of the Cloud Biolinux community:

Enis AfganMichael HeuerRichard HollandMark JensenDave MessinaSteffen MöllerRoman Valls

Thank you !