8 mattwoodaws-intro-pdf-110411093115-phpapp01

Preview:

DESCRIPTION

Matt Wood of AWS "Cloud Research"Europe April 2011 @ the Eagle Genomics Symposium

Citation preview

Cloud Research

Matt WoodT E C H N O L O G Y E V A N G E L I S T

Hello.

Text

Thank you.

The Cloud by Example

The Cloud by Example

Infrastructureservices

?

On demand

Pay as you go

Pay for what you use

Elastic capacity

Capacity

Time

Estimated demand

Capacity

Time

Estimated demand

Investment

Infrastructure

Capacity

Time

Realdemand

Infrastructure

Capacity

Time

Realdemand

Elasticcapacity

Agility

Faster to prototype

Faster to production

Undifferentiated heavy lifting

Tools for accelerating

research

0

75

150

225

300

Q4 2006Q4 2007

Q4 2008Q4 2009

Q4 2010

The Cloud by Example

Data management

Biomarker Warehousepre-clinical, clinical, 3rd party data and publications

!"#$%"&&'

!#%&$(%&&&'

!)*(%"&&'

+,'-./01'

23,3415'61789:1'

;<./5'=>?6@'

6178170' 6A.7341' B817-135'

Estimated cost: 10 TB warehouse over 3 years

Data processing

http://cyclecomputing.com

http://www.rightscale.com

Input  S3  bucket

Output  S3  bucket

Amazon S3

Hadoop

Amazon EC2 Instances

Input dataset

outputresults

Deploy Application

Web Console, Command line tools

End

Notify

Get ResultsInput Data

Amazon Elastic MapReduce

Hadoop Hadoop

Hadoop

Hadoop

Hadoop

Elastic MapReduce

Elastic MapReduce

Preprocessed reads

Map: Bowtie

Sort: Bin and partition

Reduce: SoapSNP

Crossbow: Rapid whole genome SNP analysis

Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.

CloudBurst

Catalog k-mers Collect seeds End-to-end alignment

http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369

ASSEMBLING GENOMES

140  million  454  reads

Image:  Ma)  Wood

Map 100 million, 100 base paired end readsQuad core with 5 GB of RAM would take 16 days

30 high-memory instances; 32 hours; $195

BLAT @ U. PENN

HEAVY-ION COLLISIONS @ RHIC

Problem: Quark physics conference imminent but no compute resources handy

Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done

Collaboration

http://www.cloudbiolinux.com/

http://usegalaxy.org/cloud

Applications and platforms

http://heroku.com

http://chempedia.com/

Security

Shared responsibility

Requirement based access

Certification

ISO 27001+

SAS 70 Type II

PCI DSSLevel 1

Security organisation Employee lifecycle

Logical security Secure data handling

Physical security Environmental safeguards

Change management Incident handling

Data integrity Availability and redundancy

Control objectives

Data access control

Identity and access

Independent buildings Separate flood zonesGeographically

separated

Redundantpower

Redundant connectivity Highly monitored

Default deny firewall

Security groups

DDOSMan in the Middle

IP spoofing

Resource isolationVirtual Private Cloud

Amazon Web Services infrastructure

Secure VPN connection over the internet

VPN Gateway Router

Customer’s isolated AWS resources

Subnet 1 Subnet 2

Subnet 4Subnet 3

Customer’s network

Dedicated instancesVirtual Private Cloud

aws.amazon.com/security

Data stays local

aws.amazon.com

Thank you!

matthew@amazon.com

Q U E S T I O N S + C O M M E N T S

@mzaO N T W I T T E R