Upload
deepak-singh
View
3.508
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Life science research, data platforms and cloud computing
Citation preview
Platforms for data science
Deepak Singh, Ph.D.Amazon Web Services
Data transmission for international genomics projects 2010
the new reality
lots and lots and lots and lots and lots of data
lots and lots and lots and lots and lots of
people
lots and lots and lots and lots and lots of
places
constant change
science in a new reality
science in a new reality^
science in a new realitydata
^
data as a programmable resource
versioning
provenance capture
filter
aggregate
integrate
extend
mashup
automate
human interfaces
tough problem
really tough problem in the new reality
goal
optimize the most valuable resource
compute, storage, workflows, memory,
transmission, algorithms, cost, …
people
Credit: Pieter Musterd a CC-BY-NC-ND license
enter the cloud
what is the cloud?
infrastructure
scalable
highly available
dynamic
extensible
secure
a utility
programmable
class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] endend
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
a data science platform
dataspaces
Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data
accept all data formats
evolve APIs
beyond the database and the data warehouse
move compute to the data
data is a royal garden
compute is a fungible commodity
“I terminate the instance and relaunch it. Thats my error handling”
Source: @jtimberman on Twitter
the cloud is an architectural and
cultural fit for data science
amazon web services
your data science platform
s3://1000genomes
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics
http://bowtie-bio.sourceforge.net/crossbow/index.shtmlhttp://contrail-bio.sourceforge.net
http://bowtie-bio.sourceforge.net/myrna/index.shtml
AWS knows massively scalable infrastructure
you know the needs of the science
we can make this work together
[email protected] Twitter:@mndoci
http://slideshare.net/mndoci
Inspiration and ideas from Matt Wood, James Hamilton
& Larry Lessig
Credit” Oberazzi under a CC-BY-NC-SA license