Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ScienceCloud’11
Cumulus: An Open Source Storage Cloud for
Science John Bresnahan, Kate Keahey, David
LaBissoniere, Tim Freeman
Argonne National Laboratory
Computation Institute, University of Chicago
1 6/14/2011
ScienceCloud2011 2nd Workshop on Scientific Cloud Computing San Jose, California -- June 8th, 2011
ScienceCloud’11
Cumulus
• An open source storage cloud service
• Amazon’s S3 REST implementation
• Ideal for experimentation – Going Back and Forth: Efficient Hypervisor-Independent Multi-
Deployment and Multi-Snapshotting on Clouds, Friday at 1:30
• Extensible remote layer for storage
systems
– Can be backed by any file/storage system
6/14/2011 2
ScienceCloud’11
Storage Clouds
• Outsourcing
• Storage cloud usage patterns
– Often long term storage
– Highly available/reliable
– Typically not used for interactive data
• Why does science need a storage cloud?
– Applications produce huge volumes of data
– Archives
6/14/2011 3
ScienceCloud’11
Successful Commercial Example: S3 Storage Cloud
• Amazon’s Storage Cloud
– Closed source/hosted
– Charged by bytes transferred in/out + monthly storage
• De-facto standard
– REST interface
– Many client side tools and APIs
• Highly available/reliable
– 99.999999999% durability
– 99.99% availability
6/14/2011 4
ScienceCloud’11
Storage Cloud for Science
• Can S3 help science?
• Scientific applications typically
produce large volumes of data
– 1TB $385.40 / month
• Writing it once
• Reading it once
– STAR produces several TB per
day, ~$4k / month
• More if they ever read it!
• The protocol?
– Research and experimentation
needed.
– A different economy
• Quotas not dollars
ScienceCloud’11
Enable administrators to extend, experiment and customize Use the hardware that you have
Cumulus Goals
6/14/2011 6
Cumulus
Interfaces
S3
High-quality, extensible, customizable, open source implementation of standard interfaces.
Easy to install
…
Redirection
Storage
Mongo DB GPFS HDFS …
ScienceCloud’11
Architecture
6/14/2011 7
• Layered approach
– Protocol interpretation
– Functionality marshaling
– Replication redirection
– Storage
• Storage details abstracted
out
– Reliability/availability vs.
complexity/cost
ScienceCloud’11
Redirection Module
• Provides scalability
• Plug-able architecture
– The enabled module decides when/where to
redirect
– Ships with round-robin and random
– Enables experimentation
• Amazon error codes
– Works with 3rd party clients like s3cmd
6/14/2011 8
ScienceCloud’11
Load Balancing
6/14/2011 9
Server list
Cumulus Server
Redirection Module
Cumulus Server
Redirection Module
Cumulus Server
Redirection Module
Client
• A client can contact any server
• The redirection module may throw
an error directing the client at
another server.
• Overhead in redirection
– https connection
– Authz lookups
ScienceCloud’11
Cumulus Server
Storage Module
• Modular approach to storage – Plug-able architecture
– Custom made plug ins for any type of storage system
– Simple abstraction, easy to program.
• Admins can leverage whatever storage system they already have on their cluster – GPFS, PVFS, HDFS, Mongo DB,
Blobseer, etc.
6/14/2011
POSIX
10
Storage Interface
HDFS Mongo DB
ScienceCloud’11
Replication
6/14/2011 11
Cumulus Server
Server list
Redirection Module
Storage System
Storage Module
Cumulus Server
Redirection Module
Storage Module
Cumulus Server
Redirection Module
Storage Module
Storage system must have
a shared namespace
ScienceCloud’11
Performance
6/14/2011 12
2 4 8 16 32 64 128 256 512 1024 2048
0
50
100
150
200
250
300
350
400
450
File size (MB)
Thro
ughput
(mbps)
Downloads
cumulus
scp
gridftp
disk(bonnie)
2 4 8 16 32 64 128 256 512 1024 2048
0
50
100
150
200
250
300
350
400
450
File size (MB)
Thro
ughput
(mbps)
Uploads
cumulus
scp
gridftp
disk(bonnie)
• Compared to common transfer protocols
– Exceeds scp
– On par with GridFTP
• Bonnie use to show the disk bottleneck
ScienceCloud’11
Fair Sharing
• 32 Clients transfer 512MB
simultaneously
• Compare collective throughput
to single.
• Largest deviation from mean:
– 0.40mbps download
– 1.18mbps upload
• Caching effects were introduced
by the study
6/14/2011 13
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
Client ID
Thro
ugh
pu
t (M
bp
s)
Fair sharing 32 Clients, 512MB File Transfers
put
get
Put Get
0
50
100
150
200
250
300
350
Collective
Single
Simultaneous Clients vs. Single Client
ScienceCloud’11
Scalability : Increased Replication
6/14/2011 14
1 2 3 4 5 6 7 8
0
20
40
60
80
100
120
Server Count
Ave
rag
e C
lien
t T
hro
ug
hp
ut (M
bp
s)
Throughput as Server Replication Increases
GPFS
Local Disk
Single Server
Linear
• 80 simultaneous clients on
8 machines (10 on each)
• Download 512MB file
• Increasing replication
factor
• Round-robin redirection
• Linear scale until 4
– Redirects take time when
the hosts NIC gets
congested
– DNSRR
ScienceCloud’11
Scalability
6/14/2011 15
-80 0 80 160 240 320 400 480 560 640 720 800 880
0
50
100
150
200
250
Client ID
(mb
ps)
Client Throughput 8 Replicated Servers
Local Disk
GPFS
Median
• Achieved throughput of each client
• 80% of the clients are within +/- 27mbps of median – Median 96 mbps
• Outlying clients show high performance – Lucky clients that avoid redirect
• Benefit from briefly unused network as others are redirected
– None suffer very poor performance
• All individual trials show a similar foot print
ScienceCloud’11
• 8 replicated servers
• Vary client load from 8 to 80
– 10 client machines
– Each trial adds a new client to each
machine
• Consistently out perform single
by a significant factor.
• File System Effects on
Redirection
– Similar profile except in ideal
circumstances
6/14/2011 16
Scalability : Increased Client Load
8 16 24 32 40 48 56 64 72 80
0
100
200
300
400
500
600
700
800
900
Clients At Once
Avera
ge B
W p
er
Clie
nt
(Mb/s
)
8 Replicated vs. Single Server Two replication algs (GPFS)
Round Robin
Random
Single Server
8 16 24 32 40 48 56 64 72 80
0
100
200
300
400
500
600
700
800
900
Clients At Once
Avera
ge B
W p
er
clie
nt
(Mb/s
)
8 Replicated vs. Single Server Two replication algs (local disk)
Round Robin
Random
Single Server
ScienceCloud’11
Scalability : Increased Client Load
6/14/2011 17
8 16 24 32 40 48 56 64 72 80
0
100
200
300
400
500
600
700
800
900
Clients At Once
Avera
ge B
W p
er
Clie
nt
(Mb/s
)
8 Replicated vs. Single Server Two replication algs (GPFS)
Round Robin
Random
Single Server
8 16 24 32 40 48 56 64 72 80
0
100
200
300
400
500
600
700
800
900
Clients At Once
Avera
ge B
W p
er
clie
nt
(Mb/s
)
8 Replicated vs. Single Server Two replication algs (local disk)
Round Robin
Random
Single Server
8 16 24 32 40 48 56 64 72 80
0
1
2
3
4
5
6
7
Clients at Once
Tim
es F
aste
r
8 Replicated vs. Single Server Local Disk
Round Robin
Random
8 16 24 32 40 48 56 64 72 80
0
1
2
3
4
5
6
7
Clients At Once
Tim
es F
aste
r
8 Replicated vs. Single Server GPFS
Round Robin
Random
ScienceCloud’11
Related Work
• S3 – Hosted. Closed source. Expensive I/O
• Open Stack’s Swift – Extensibility/experimentation
• Open Nebula – A very simplistic image repository
• Walrus (Eucalyptus) – Open “core”
– No options for scalability
• GridFTP – A transfer service, not a storage cloud service
– A good transfer performance benchmark
6/14/2011 18
ScienceCloud’11
Future Work
• Proxy server to S3
• Storage vs. image propagation – Study benefits of separating the user interface to
images from the VMMs
– Going Back and Forth: Efficient Hypervisor-Independent Multi-Deployment and Multi-Snapshotting on Clouds, Bogdan Nicolae; Friday 1:30
• DNSRR performance vs. Cumulus redirection
• Tiered service levels – providing different availability / reliability level
• Cloud agnostic storage systems – Coming soon: cloudinit.d
6/14/2011 19
ScienceCloud’11
Summary
• Open source S3 protocol interpreter
– Ideal for science cloud experimentation
• Light weight, easy installation
• Customizable backends
– Use you existing file/storage system
• Quotas
• Performance on par with GridFTP
• Reasonable fair sharing under load
• Horizontal scalability
6/14/2011 20
ScienceCloud’11
The Nimbus Team • Project lead: Kate Keahey, ANL&UC
• Committers: – Tim Freeman - University of Chicago
– Ian Gable - University of Victoria
– David LaBissoniere - University of Chicago
– John Bresnahan - Argonne National Laboratory
– Patrick Armstrong - University of Victoria
– Pierre Riteau - University of Rennes 1, IRISA
• Github Contributors: – Tim Freeman, David LaBissoniere, John Bresnahan,
Pierre Riteau, Alex Clemesha, Paulo Gomez, Patrick Armstrong, Matt Vliet, Ian Gable, Paul Marshall, Adam Bishop
• And many others – See http://www.nimbusproject.org/about/people/
6/14/2011 21
ScienceCloud’11 6/14/2011 22
www.nimbusproject.com
Let’s make cloud computing for science happen.