Upload
robert-grossman
View
991
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This is a talk I gave at NHGRI in March 2010.
Citation preview
An Overview of Bionimbus and the Open Cloud Consortium
Robert GrossmanOpen Cloud Consortium
Institute for Genomics & Systems BiologyUniversity of Chicago
Laboratory for Advanced ComputingUniversity of Illinois at Chicago
Part 1. Bionimbus
www.bionimbus.org
Database Services
Analysis Pipelines & Re-analysis
Services
Web Portal & Widgets
Large Data Cloud Services
Data Ingestion Services
Elastic Cloud Services
Scalable data transport
Case Study 1: Cistrack
• Resource for cis-regulatory data.• Integrates databases and large data clouds.• Open source.• Contains raw data, intermediate, and analyzed
data from approximately 300 experiments from Agilent, Affy and Solexa platforms.
Flynet Provides Web 2.0 Access to Cistrack
Cube is an Elastic Cloud For Re-analysis
Case Study 2
SNP concordance:
Alignment against gene models: 46%
TopHat alignment: 91%
71 rare, deleterious SNP genotypes were validated by Sequenom.
• Ran TopHat in Bionimbus using Cube-based VMs.• Total time went from 25 days to 1 day.
App
OS
App
OS
App
OS
Hypervisers
Racks of Hardware
Private cloud (Eucalyptus & Cube)
Working Space
Simple Persistent
Storage (glusterfs)
Virtual MachinesmodENCODE Worm/Fly peak calling reanalysis
Case Study 3
ftp
ssh
App
OS
App
OS
App
OS
Hypervisers
Hardware Cluster
Private / Community cloud
Virtual Machines
Bionimbus virtual machine images
Public cloud
ami-efa24c86
Hybrid Clouds
Bionimbus Delivery Mechanisms
• Login and use the Bionimbus cloud.• Use Bionimbus Virtual Machine Images in a)
your private cloud; b) Bionimbus cloud; c) public clouds such as Amazon.
• Bionimbus is open source and you can build your own cloud (and interoperate with ours) (First release of integrated system 3Q 2010)
• Bionimbus data services for genomic data, even for large datasets
Goal: Minimize latency and control heat.
Goal: Maximize data (with matching compute) and control cost.
Goal: Minimize cost of virtualized machines & provide on-demand.
HPC
Large Data Clouds
Elastic Clouds
Persist & refresh data over the long term
High speed network to move & share the data
Web 2.0/3.0 user interface
Compute services at the scale of a data center.
A successful cloud will…
Part 2.
www.opencloudconsortium.org
13
• 501(c)(3) Not-for-profit corporation• Develops standards, interoperability
frameworks, and reference implementations.
• Operates clouds.• Develops benchmarks.• One area of focus: bridge between
private and public clouds.14
www.opencloudconsortium.org
Operates Clouds
• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3
each year.
• Open Cloud Testbed• Open Science Data Cloud• Cloud-based Disaster
Relief Services
OCC Members
• Companies: Yahoo, Cisco, Aerospace Corp., Booz Allen Hamilton, InfoBlox, Open Data Group, Raytheon
• Universities: CalIT2, Johns Hopkins, Northwestern University, University of Chicago, University of Illinois at Chicago
• Government agencies: NASA
16
Open Cloud Consortium Perspective
• Vendor neutral• Open, interoperable
architecture• Experiment at scale• Operate infrastructure at the
scale of a small data center• Long term point of view
(think like a library not cloud service provider)
• Think public, private & hybrid clouds
Raywulf rack
Condo Clouds
Open Cloud Testbed
Phase 2• 9 racks• 250+ Nodes• 1000+ Cores• 10+ Gb/s
19
MREN
CENIC Dragon
Hadoop Sector/Sphere Thrift KVM VMs Eucalyptus Nova
C-Wave
Open Science Data Cloud
20
Astronomical dataBiological data (Bionimbus)
Networking dataImage processing for disaster relief
Storage Services
Compute Services
Applications
Virtual Network Manager
Data Services
Network Transport
Virtual Machine Manager
CloudMetadata Services
Identity Manager
IaaS
PaaS
Apps
Standards
Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)
Platform as a Service– Cloud Compute Services– Data/Table Cloud Services– Cloud Storage Services
Open Virtualization Format (OVF)
Open Cloud Computing Interface (OCCI)
SNIA Cloud Data Management Interface (CDMI)
Large Data Cloud Interoperability Framework
OCC Benchmarks
MalStone A MalStone BLarge Data Cloud 1a 455m 13s 840m 50s
Large Data Cloud 1b 87m 29s 142m 32s
Large Data Cloud 2 33m 40s 43m 44s
There are surprises.
Acknowledgements
Thank You
• For more information:– www.bionimbus.org– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)