58
Platforms for data science Deepak Singh, Ph.D. Amazon Web Services Data transmission for international genomics projects 2010

Platforms for data science

Embed Size (px)

DESCRIPTION

Life science research, data platforms and cloud computing

Citation preview

Page 1: Platforms for data science

Platforms for data science

Deepak Singh, Ph.D.Amazon Web Services

Data transmission for international genomics projects 2010

Page 2: Platforms for data science

the new reality

Page 3: Platforms for data science

lots and lots and lots and lots and lots of data

Page 4: Platforms for data science

lots and lots and lots and lots and lots of

people

Page 5: Platforms for data science

lots and lots and lots and lots and lots of

places

Page 6: Platforms for data science

constant change

Page 7: Platforms for data science

science in a new reality

Page 8: Platforms for data science

science in a new reality^

Page 9: Platforms for data science

science in a new realitydata

^

Page 10: Platforms for data science

data as a programmable resource

Page 11: Platforms for data science

versioning

Page 12: Platforms for data science

provenance capture

Page 13: Platforms for data science

filter

Page 14: Platforms for data science

aggregate

Page 15: Platforms for data science

integrate

Page 16: Platforms for data science

extend

Page 17: Platforms for data science

mashup

Page 18: Platforms for data science

automate

Page 19: Platforms for data science

human interfaces

Page 20: Platforms for data science

tough problem

Page 21: Platforms for data science

really tough problem in the new reality

Page 22: Platforms for data science

goal

Page 23: Platforms for data science

optimize the most valuable resource

Page 24: Platforms for data science

compute, storage, workflows, memory,

transmission, algorithms, cost, …

Page 26: Platforms for data science

enter the cloud

Page 27: Platforms for data science

what is the cloud?

Page 28: Platforms for data science

infrastructure

Page 29: Platforms for data science

scalable

Page 30: Platforms for data science

highly available

Page 31: Platforms for data science

dynamic

Page 32: Platforms for data science

extensible

Page 33: Platforms for data science

secure

Page 34: Platforms for data science

a utility

Page 35: Platforms for data science

programmable

Page 36: Platforms for data science

class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] endend

Page 37: Platforms for data science

include_recipe "packages"include_recipe "ruby"include_recipe "apache2"

if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend

gem_package "passenger" do version node[:passenger][:version]end

execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end

Page 38: Platforms for data science

a data science platform

Page 39: Platforms for data science

dataspaces

Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data

Page 40: Platforms for data science

accept all data formats

Page 41: Platforms for data science

evolve APIs

Page 42: Platforms for data science

beyond the database and the data warehouse

Page 43: Platforms for data science

move compute to the data

Page 44: Platforms for data science

data is a royal garden

Page 45: Platforms for data science

compute is a fungible commodity

Page 46: Platforms for data science

“I terminate the instance and relaunch it. Thats my error handling”

Source: @jtimberman on Twitter

Page 47: Platforms for data science

the cloud is an architectural and

cultural fit for data science

Page 48: Platforms for data science

amazon web services

Page 49: Platforms for data science

your data science platform

Page 50: Platforms for data science

s3://1000genomes

Page 51: Platforms for data science

Credit: Angel Pizzaro, U. Penn

Page 52: Platforms for data science

http://usegalaxy.org/cloud

Page 54: Platforms for data science
Page 55: Platforms for data science

AWS knows massively scalable infrastructure

Page 56: Platforms for data science

you know the needs of the science

Page 57: Platforms for data science

we can make this work together

Page 58: Platforms for data science

[email protected] Twitter:@mndoci

http://slideshare.net/mndoci

Inspiration and ideas from Matt Wood, James Hamilton

& Larry Lessig

Credit” Oberazzi under a CC-BY-NC-SA license