Upload
deepak-singh
View
1.788
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Presentation at the Discovery 2015 Workshop on Cloud Computing at Berkeley
Citation preview
There is no magic, there is only awesomeScien&fic compu&ng with Amazon Web Services
Deepak SinghBusiness Development Manager -‐ Amazon Compute Services
Discovery 2015 Workshop, July 23 2010
Via Reavel under a CC-BY-NC-ND license
life science industry
Credit: Bosco Ho
By ~Prescott under a CC-BY-NC license
<1>
the cloud
has_many :definitions
infrastructure as a service
The “Living and Evolving” CloudAWS services and basic terminology
Most Applica9ons Need:
1. Compute
2. Storage
3. Messaging
4. Payment
5. Distribu9on
6. Scale
7. Analy9cs Amazon EC2 Instances(On-‐Demand, Reserved, Spot)
Amazon Virtual Private Cloud
Amazon Worldwide Physical Infrastructure (Geographical Regions, Availability Zones, Edge Locations)
Amazon S3 Objects and Buckets
Amazon CloudFront
EBSVolumesA
mazon SQ
S Que
ues
Amazon SimpleD
B Domains
Auto-‐Scaling
Elastic LB
CloudWatch
Your Application
Paym
ent : A
mazon FP
S/ Dev
Pay
Snapshots
Amazon Elastic MapReduce JobFlows
Amazon RDS
Amazon SN
S To
pics
ScalableIncrease or decrease capacity in minutes
AutomaIon
ScalableIncrease or decrease capacity in minutes
AutomaIon
Cost Effec9veLow rate, pay-‐as-‐you-‐go
ScalableIncrease or decrease capacity in minutes
AutomaIon
Cost Effec9veLow rate, pay-‐as-‐you-‐go
ReliableMission CriIcal Infrastructure
ScalableIncrease or decrease capacity in minutes
AutomaIon
Cost Effec9veLow rate, pay-‐as-‐you-‐go
SecureMulIlayer security faciliIes
ReliableMission CriIcal Infrastructure
compute
elastic compute cloud
elastic
3000 CPU’s for one firm’s risk management application
!"#$%&'()'*+,'-./01.2%/'
344'+567/'(.'
8%%9%.:/'
;<"&/:1='
>?,3?,44@'
A&B:1='
>?,>?,44@'
C".:1='
>?,D?,44@'
E(.:1='
>?,F?,44@'
;"%/:1='
>?,G?,44@'
C10"&:1='
>?,H?,44@'
I%:.%/:1='
>?,,?,44@'
3444JJ'
344'JJ'
programmable
// Run an instance$EC2 = new AmazonEC2();
$Options = array('KeyName' => "Jeff's Keys", 'InstanceType' => "m1.small");
$Res = $EC2->run_instances("ami-db7b9db2", 1, 1, $Options);
more later
cost effective
3000 CPU’s for one firm’s risk management application
!"#$%&'()'*+,'-./01.2%/'
344'+567/'(.'
8%%9%.:/'
;<"&/:1='
>?,3?,44@'
A&B:1='
>?,>?,44@'
C".:1='
>?,D?,44@'
E(.:1='
>?,F?,44@'
;"%/:1='
>?,G?,44@'
C10"&:1='
>?,H?,44@'
I%:.%/:1='
>?,,?,44@'
3444JJ'
344'JJ'
% U
tiliza
tio
n
time
% U
tiliza
tio
n
time
Ideal Effective Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Real Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Real Utilization
on-demand instancesreserved instances
spot instances
Amazon EC2 On-Demand price for the same instance is $0.50
% U
tiliza
tio
n
time
Ideal Effective Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Reserved Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Reserved Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Reserved Utilization
On Demand Utilization
% U
tiliza
tio
n
time
Ideal Effective Utilization
Reserved Utilization
On Demand Utilization
Spot Utilization
secure
Physical Interface
Firewall
Hypervisor
Custom
er A
Custom
er B
…
Custom
er Z• Guest operaIng system doesn’t
have elevated privilege level.• Instances are completely isolated.
• Intrinsic network firewall.• No access to raw devices.• Virtualized disks, logically isolated, wiped clean aRer use.
{ "Version": "2008-10-17", "Id": "Queue1_Policy_UUID", "Statement": { "Sid":"Queue1_AnonymousAccess_ReceiveMessage_TimeLimit", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "SQS:ReceiveMessage", "Resource": "/987654321098/queue1", "Condition" : { "DateGreaterThan" : { "AWS:CurrentTime":"2009-01-31T12:00Z" }, "DateLessThan" : { "AWS:CurrentTime":"2009-01-31T15:00Z" } } } }
Customer’sNetwork
AmazonWeb ServicesCloud
Secure VPN Connection over the Internet
Subnets
Customer’s isolated AWS resources
RouterVPN Gateway
Amazon Virtual Private Cloud (VPC)
storage
Amazon S3
highly scalable
highly available
highly durable
Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
...Node 1 Node n
T
Region
Datacenter
Datacenter
Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
Datacenter
...Node 1 Node n
elastic block store
block device
resizable
boot device
one size does not fit all
• Zero administra9ve overhead (automaIc handling of geo-‐redundant replicaIon, index creaIon, database tuning)
• AutomaIc and elasIc scaling of resources to meet request load
• High availability (mulIple copies of data for reliability and failover)
• Flexibility (schema-‐less data store)
• Cost-‐effecIve blob or large object storage
• Minimal rela9onships between objects
Amazon RDSAmazon SimpleDB
Amazon S3 Amazon EC2 + EBS
• Na9ve access to database engine
• Easy migra9on path (exisIng code, tools, applicaIon are compaIble)
• Key features of a relaIonal database, such as joins or complex transac9ons
• Managed experience (offload common DBA tasks, lower total cost of ownership)
• Mul9ple flavors of database engine
• Complete control
an ecosystem prospers
<2>
infrastructure as code
Source: Chris Dagdigian
• Images:– RegisterImage– DescribeImages– DeregisterImage– ModifyImageAcribute
– DescribeImageAcribute
– ResetImageAcribute
• Instances:– RunInstances– DescribeInstances– TerminateInstances– StopInstances– GetConsoleOutput– RebootInstances– CreatePlacementGroup– DescribePlacementGroup
• IP Addresses:– AllocateAddress– ReleaseAddress– AssociateAddress– DisassociateAddress– DescribeAddresses
• Keypairs:– CreateKeyPair– DescribeKeyPairs– DeleteKeyPair
• Security Groups:– CreateSecurityGroup
– DescribeSecurityGroups
– DeleteSecurityGroup
– AuthorizeSecurityGroupIngress
– RevokeSecurityGroupIngress
• Block Storage Volumes:
– CreateVolume
– DeleteVolume
– DescribeVolumes
– AhachVolume
– DetachVolume
– CreateSnapshot
– DescribeSnapshots
– DeleteSnapshot
• VPC:– CreateCustomerGateway– DeleteCustomerGateway– DescribeCustomerGateways– AssociateDhcpOpIons– CreateDhcpOpIons– DeleteDhcpOpIons– DescribeDhcpOpIons– CreateSubnet– DeleteSubnet– DescribeSubnets– CreateVpc– DeleteVpc– DescribeVpcs– CreateVpnConnecIon– DeleteVpnConnecIon– DescribeVpnConnecIons– AcachVpnGateway– CreateVpnGateway– DeleteVpnGateway– DescribeVpnGateways– DetachVpnGateway
using libraries
def access_key options.services['access-key'] end def secret_key options.services['secret-key'] end
Accesscredentials
class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end
end
class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] endend
class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end
def instance_index if @instance_index.empty? @ec2.describe_instances.each do |i| # create an Instance object & add to the array @instance_index[i[:aws_instance_id]] = Instance.new(i, get_elastic_ip_for_instance_id(i[:aws_instance_id])) end end return @instance_index end
end
Custom index
class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end
def running? status == "running" end end
Helper
configuration management
cfengine
puppet
chef
chef
dsl
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
Modular
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
OS aware
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
Ruby syntax
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
Package aware
include_recipe "packages"include_recipe "ruby"include_recipe "apache2"
if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" endelse %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end endend
gem_package "passenger" do version node[:passenger][:version]end
execute "passenger_module" do command 'echo -en "\n\n\n\n" | passenger-install-apache2-module' creates node[:passenger][:module_path]end
Execute
recipes
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755end
Template template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755end
Cookbookre-use
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755end
<3>
architectural lessons
design for failure
“Everything fails, all the time”-- Werner Vogels
“Things will crash. Deal with it”-- Jeff Dean
2-4% of serverswill die annually
Source: Jeff Dean, LADIS 2009
1-5% of disk drives will die every year
Source: Jeff Dean, LADIS 2009
2.3% AFR in population of 13,2503.3% AFR in population of 22,400
4.2% AFR in population of 246,000
2.3% AFR in population of 13,2503.3% AFR in population of 22,400
4.2% AFR in population of 246,000
Source: James Hamilton (http://perspectives.mvdirona.com)
human errors
human errors~20% admin issues have unintended consequences
Source: James Hamilton (http://perspectives.mvdirona.com)
assume sw/hw failure
avoid single points of failure
system as a whole is reslient
loose coupling sets you free
loose coupling sets you free
using message queues
Controller A Controller B Controller C
Controller A Controller B Controller C
Q Q Q
Tight Coupling
Loose Coupling using Queues
implement elasticity
no assumptions
resilience to reboot
bootstrap
dynamic
multi-layered security
“Web” Security Group:TCP 80 0.0.0.0/0TCP 443 0.0.0.0/0TCP 22 “App”
“App” Security Group:TCP 8080 “Web”TCP 22 172.154.0.0/16TCP 22 “App”
“DB” Security Group:TCP 3306 “App”TCP 3306 163.128.25.32/32TCP 22 “App”
embrace constraints
distributed memory
sharded DBs
hardware failed?
simply throw it away and switch to new hardware with no additional cost
cache
think parallel
different architectures
multi-threaded, concurrent requests
mapreduce
elastic load-balancing
decompose jobs into simplest form
leverage many storage options
<4>
computing in the cloud
3 modalities
batch processing
“grids”
queues
URL Queue
Fetch & Store Page
Parse Queue
Parse Page
Image Queue
Fetch Images S3
Render Queue
Render Images & Pages
S3S3
Source: Jeff Barr
http://wiki.github.com/documentcloud/cloud-crowd
sudo gem install cloud-crowd
data-intensive computing
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
outputresults
Deploy Application
Web Console, Command line tools
End
Notify
Get ResultsInput Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
Use Case: Increase speed of running job flowsSpeed up job flow execuIon in response to changing requirements
Dynamically balance cost versus performance without restarIng a job
PREANNOUNCE – EXPAND/SHRINK CLUSTERS
Allocate 4 instances
Expand to 25 instances
Expand to 9 instances
Job Flow
Time remaining:
Time remaining:14 Hours
3 Hours
Time remaining:
Job FlowJob Flow
7 Hours
Use Case: Agile Data Warehouse ClusterCustomize cluster size to support varying resource needs
Leverage flexibility to reduce costs and increase cluster uIlizaIon
Allocate 9 instances
Expand to 25 instances
Shrink to 9 instances
Data Warehouse(Steady State)
Data Warehouse(Steady State)
Data Warehouse(Batch Processing)
PREANNOUNCE – IntegraIon with Spot Instances
Allocate 4 instances
Expand to 9 instances
Job Flow
Time remaining:14 Hours
Time remaining:
Job Flow
7 Hours
Cost without Spot:4 instances *14 hrs * $0.50 = $28
Cost with Spot:4 instances *7 hrs * $0.50 = $13 +5 instances * 7 hrs * $0.25 = $8.75Total = $21.75
Savings: ~22%
high performance computing
Low latencyhigh bandwidth
cluster compute instances
full bisection bandwidth
10gbps
2 * Xeon 5570 (Intel “Nehalem”)23 GB RAM
10 gbps Ethernet1690 TB local disk
HVM-based virtualization$1.60 / hr
managing compute cycles
http://web.mit.edu/stardev/cluster/
SQS
<5>
AWS + science = win
3.7 million classifications in just over three days~15 million in less than a month>2.6 million clicks in 100 hours
Biomarker Warehousepre-clinical, clinical, 3rd party data and publications
Estimated cost: 10 TB warehouse over 3 years
Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API
http://faculty.washington.edu/danielt/Source: Ed Lazowska
Protein interactions @ U. Washington
http://bioteam.net/aws
200 instances60000 structures
4 hours
HEAVY-ION COLLISIONS
Problem: Quark matter physics conference imminent but no compute resources handy
Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done
Image: Wikipedia
lots and lots and lots and lots and lots and lots of data andlots and lots of lots of data
Image via image editor under a CC-‐BY License
Image: NOAA
scaleavailabilityutilizationsharing
collaboration
we are data geeks not data center geeks
Map 100 million, 100 base paired end readsQuad core with 5 GB of RAM would take 16 days
30 high-memory instances; 32 hours; $195Source: Angel Pizzaro/John Hogenesch
BLAT @ U. Penn
BELLE MONTE CARLO
Credit: Tom Fifield
MapReduce for Genomics
Ben Langmead
http://bowtie-bio.sourceforge.net/crossbow/index.shtmlhttp://contrail-bio.sourceforge.net
http://bowtie-bio.sourceforge.net/myrna/index.shtml
platform for science
Elastic-R Collaborative Research Environment
http://www.elasticr.net
http://aws.amazon.com/publicdatasets/
s3://1000genomes
[email protected] Twicer:@mndoci
slides at hcp://slideshare.net/mndoci
InspiraIon and material from Mah Wood,James Hamilton & Larry Lessig
By Oberazzi under a CC-BY-NC-SA license