13
What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine [email protected]

What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine [email protected]

Embed Size (px)

Citation preview

Page 1: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

What the Cloud can do for Computational Life Sciences:

Biocep-R's Unified Perspective

Karim Chine

[email protected]

Page 2: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

www.biocep.net

Page 3: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Definitions

♦ What is the Cloud ?Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.Users need not have knowledge of, expertise

in, or control over the technology infrastructure in the "cloud" that supports them. Wikipedia

Cloud Computing represents a new way to deploy computing technology to give users the ability to access, work on, share and store information using the internet. The cloud itself is a network of data centers- each composed of many thousands of computers working together- that can perform

the functions of software on a personal or busisness computer by providing users access to powerful applications, platforms and services delivered over the internet.

Jeffrey F. rayport & Andrew Heyward (Marketplace LLC)

♦ What is R ?Open-source (GPL) software environment for statistical computing and graphicsLingua franca of data analysis. Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate.

♦ What is Scilab ?Open-source (CeCILL) software package for numerical computations.Clone of Matlab. Widely used for engineering and scientific applications.

♦ What is an SCE ?Scientific Computing Environment : enables users to solve a wide variety of problems through flexible user interfaces that can model in a natural way the mathematical aspects of many different problem domains. Examples : Matlab, Mathematica, Scilab, R..

Page 4: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

e-Science perspective / Biocep-R use cases

♦ Lower the barriers for accessing cyber infrastructures.

♦ Help dealing with the data deluge (take the computation to the data)

♦ Enable collaboration within computing environments

♦ Simplify the science gateways creation and delivery process

♦ Bridge the gap between existing SCEs and grids/clouds

♦ Lower the barriers for using distributed computing, leverage the elastic cloud

Page 5: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

♦ Bridge the gap between mainstream SCEs

♦ Bridge the gap between mainstream SCEs and workflow workbenches

♦ Provide a universal computing toolkit for scientific applications

♦ Provide frameworks for computational back-ends scalability

♦ Provide the building blocks of a platform for computational education

♦ Provide the building blocks of a traceable and reproducible computational research platform

♦ Provide the building blocks of an international portal for scientific computing on demand,collaboration and computational artifacts/resources sharing

e-Science perspective / Biocep-R use cases

Page 6: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Computational Ecosystem, "The" Open Platform

Computational Components R packages : CRAN, Bioconductor, Wrapped C,C++,Fortran code

Scilab modules, Matlab Toolkits, etc.

Open source or commercial

Computational Resources Hardware/OS agnostic computing engine : R, Scilab,..

Clusters, grids, cloud servers

free: academic grids (NGS, EGEE, etc.) or pay-per-use: EC2

Computational User InterfacesVirtual workbench within the browser

Built-in views / Plugins / Spreadsheets

Collaborative views

Open source or commercial

Computational Scripts R / Python / Groovy

On client side: interactivity..

On server side: data transfer ..

Stateful or stateless, automatic mapping of R data objects and functions Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful

Computational Data Storage Local, NFS, FTP, Storage Web Services (S3) free or commercial

Generated Computational Web Services

Page 7: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Biocep-R, Technologie Environment

Page 8: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Object Export / Import Layer

RServices API RServices skeleton Graphic devices skels R packages skels

mapping

JavaGD rJava / JRI

R S

erver

Server Side - Personal Machine, Academic Grids, Clusters, Clouds

Client Side - Internet Virtual R Workbench

R Graphic Device+Interactors

R Script Editor

R Spreadsheet

Groovy / Jython Script Editor

R Workspace

Internet Browser

R Help Browser

R Console

Java Applet

Virtual R Workbench URL

Docking Framework

R Virtualization

Page 9: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Node 5 : EC2 virtual machine 2

Remote Objects Registry

Node 1: Windows XP

Front-end host

Node 4 : EC2 virtual machine 1Node 4 : EC2 virtual machine 1

Node 2: Mac OS

Node 3: 64 bits Server / Linux

Supervisor

Computational Engines Pools / cloudbursting

Cloudbursting

via AWS

Perl Scripts

logOn

Use R

logOff

.NET Appli

logOn

Use R

logOff

R-HTTP R-SOAP

Parallel Computing Applications

Borrow Rs

Use Rs

Release Rs

Web Application

Borrow R

Generate Graphics/Data

Release R

Pool B

Pool A

Pool C

Page 10: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Elastic distributed computing on Amazon EC2

Page 11: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

Shell’s Biocep-R-based statistical modelling cloud computing pilot

Extracts from Shell’s cloud computing big rules document :

<The Global Solutions statistics group actively uses the open source “R” statistical modeling tool. An inexpensive platform upon which to run the statistical models was required with the ability to scale up and down depending on calculating demand.

In order to achieve this, the pilot created an analytical application using a pool of stateless and, more importently, statefull “R” engines across multiple servers in Amazon using Biocep for integration and virtualisation of the “R” engine.

Using Amazon enabled them to have

♦ On-demand access to high-powered computing facilities. Numerically intensive statistical applications can be handled by the cloud rather than slowing down the users own PC. Could be of great benefit in the Bio-Fuels research area, which will require very computationally intensive statistical techniques.

Page 12: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org

♦ Disaster Recovery: By using virtual machine images on the cloud we can always restore to the initial state. If something goes drastically wrong with the cloud machine image we can simply scrap it and launch another instance. Safer to implement web apps on a virtual machine using AWS rather than in-house server.

♦ The Cloud can be used as a real-time collaborative workspace. Co-workers can work together and share statistical methodologies in a new and novel environment.

♦ The onset of Cloud Computing has greatly increased the availability of software for delivering web-based statistical applications. The benefits of which include:  o       No special configuration or changes are needed on users PCs.o       No need for scripting of applications.o       Compatible with all operating systems.o       Updates can be made quickly and easily in a centralized manner.  o       Everybody has a browser. Familiar interface encourages use. o       Statistical web-based applications can either be hosted on the cloud or an in- house Shell server: which may be more appropriate for most confidential data.>Contacts within Shell :

Edwin Vansteenis, Shell Global Functions, Senior IT Architect, [email protected] W. Johnes, Shell Global Services, Statistical Consultant, [email protected]

Page 13: What the Cloud can do for Computational Life Sciences: Biocep-R's Unified Perspective Karim Chine karim.chine@m4x.org