If you can't read please download the document
Upload
marian-marinov
View
1.547
Download
0
Embed Size (px)
Citation preview
Cluster Filesystems
Marian Marinov - [email protected] Architect - Siteground.com
Kosova Sofware Freedom Conference 2009
Clusters with GlusterFS
Prishtina 29-30.Aug.2009
Prishtina 29-30.Aug.2009
Agenda
Cluster Filesystems
Some facts
Gluster Design
kernel
gluster engine protocols
translators storage
performance
others
schedulers
Some benchmarks
1/29
Prishtina 29-30.Aug.2009
Cluster Filesystems
Prishtina 29-30.Aug.2009
Cluster Filesystems
3/29
Prishtina 29-30.Aug.2009
Cluster Filesystems
Prishtina 29-30.Aug.2009
GFarm Desgin
6/29
Prishtina 29-30.Aug.2009
Facts
GlusterFS project starts in August 2006
It is not actual Filesystem
Server only for any POSIX compliant but mainly tested on Linux
Client running on Linux, FreeBSD & MacOS Xas they require FUSE
Very scallable
Very easy to install and maintain
4/29
Prishtina 29-30.Aug.2009
GlusterFS Desgin
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
In the kernel
Requires FUSE
FUSE as module
GlusterFUSE
The engine
Server & Client
Transport Modules
Translators
Scheduler Modules
Prishtina 29-30.Aug.2009
GlusterFS Desgin
Prishtina 29-30.Aug.2009
GlusterFS Desgin
Prishtina 29-30.Aug.2009
GlusterFS Desgin
Prishtina 29-30.Aug.2009
GlusterFS Desgin
Prishtina 29-30.Aug.2009
GlusterFS Desgin
The picture explained:
ClientX:
volume serverX- defines a name for a remote serversubvolumes brick0- defines in which of all exported volumes from the remote server we are interested
some performance translators
volume unify- defines that we will use unify cluster translatorsubvolumes serverX serverY- defines which already connected storage volumes will be used
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Transport Modules:For TCP/IP transporttransport-type tcp/serverFor Infiniband SDP transporttransport-type ib-sdp/serverFor Infiniband Verbs transporttransport-type ib-verbs/server
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
The idea GNU/Hurd
Translators
Performance
Clustering
Features
Storage
Others
14/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Performance translators
Read Ahead
Write Behind
Threaded I/O
IO-Cache
Stat Pre-fetch still not ported to the new versions
Booster
15/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Clustering translators
Distributed Hash Table (DHT)
Stripe
Replicate (old AFR)
Unify (new HA)
NUFA
16/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Distributed Hash Table (DHT)
lookup-unhashed
min-free-disk
Replicate
read-subvolume
favorite-child
data-self-heal, metadata-self-heal & entry-self-heal
data-change-log', metadata-change-log & entry-change-log
data-lock-server-count, metadata-lock-server-count & entry-lock-server-count
Stripe & Unify
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Scheduling systems
Adaptive Least Usage (ALU)
Non-uniform filesystem architecture (NUFA)
Random
Rand-Robin
Switch
17/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Adaptive Least Usage (ALU)
disk-usage
read-usage
write-usage
open-files-usage
disk-speed-usage
18/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Non-uniform filesystem architecture (NUFA)
local-volume-name
limits.min-free-disk
Random
limits.min-free-disk
Round-Robin
limits.min-free-disk
read-only-subvolumes
refresh-interval
19/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Switch
switch.case *jpg:brick1,brick2;*mp3:brick3;*:brick4,brick5
switch.read-only-subvolumes brick7
20/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
Other translators
client
server
posix
posix-locks
bdb
filter
rot-13
trace
21/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
filter
root-squashing
read-only
fixed-uid & fixed-gid
translate-uid & translate-gid
filter-uid & filter-gid
21/29
Prishtina 29-30.Aug.2009
Gluster Filesystem Design
In the feature
Live addition/removal of nodes
Automatic File Reordering
Web GUI
mod_glusterfs
22/29
Prishtina 29-30.Aug.2009
Gluster Design
23/29
Prishtina 29-30.Aug.2009
Benchmarks
24/29
Prishtina 29-30.Aug.2009
Benchmarks
Prishtina 29-30.Aug.2009
Benchmarks
Aggregated Read Throughput Benchmark
Multiple dd utility were executed simultaneously with different block sizes to read from GlusterFS filesystem.4KB16KB128KB256KB512KB1024KBLustre 1,796 MB/s 5,782 MB/s 20,423 MB/s 21,582 MB/s 22,789 MB/s 23,731 MB/sGlusterFS 11,415 MB/s 11,424 MB/s 11,427 MB/s 11,419 MB/s 11,411 MB/s 11,409 MB/s
Aggregated Write Throughput Benchmark
Multiple dd utility were executed simultaneously with different block sizes to write to GlusterFS filesystem.4KB 16KB 128KB 256KB 512KB 1024KBLustre 969 MB/s 1,613 MB/s 1,988 MB/s 1,989 MB/s 1,984 MB/s 1,983 MB/sGlusterFS 1,886 MB/s 2,191 MB/s 2,237 MB/s 2,231 MB/s 2,236 MB/s 2,223 MB/s
Note: Higher means faster.
26/29
Prishtina 29-30.Aug.2009
Benchmarks
Apache Web Server Benchmark
Apache served 12039 files (595 MB) over HTTP protocol. wget client fetched the files recursively.TimeLustre Failed after downloading 33 MB out of 585 MB in 11 mins.GlusterFS 3 mins 11 secs
Archive Creation
'tar utility created an archive of 12039 files (595 MB) served through GlusterFS.TimeLustre 41 secsGlusterFS 25 secs
Archive Extraction
TimeLustre FAILEDNo space left on device.GlusterFS 43 secs
Note: Lower means faster.
27/29
Prishtina 29-30.Aug.2009
Benchmarks
Prishtina 29-30.Aug.2009
Benchmarks
Test Case Local SATA 500G Local RAID0 NFSSingle GlusterFS Unified GlusterFS
1 Worker - 1st test 11.836s (82.5MB/s) 11.371s (85.9MB/s) 23.162s (42.2MB/s) 2m19.597s (7.0MB/s) 3m39.279s (4.4MB/s)
1 Worker - 2nd test 10.537s (92.7MB/s) 10.777s (90.6MB/s) 24.181s (40.4MB/s) 2m24.623s (6.7MB/s) 3m40.334s (4.4MB/s)
Sequencial Write : 1KB x 1,000,000 times = 1GB
# time dd if=/dev/zero of=/mnt/unify/file bs=1024 count=1000000
Prishtina 29-30.Aug.2009
Benchmarks
Test Case Local SATA 500G Local RAID0 NFSSingle GlusterFS Unified GlusterFS
1 Worker - 1st test 6.390s(152.8MB/s) 7.939s (123.0MB/s) 22.766s (42.9MB/s) 24.637s (39.6MB/s) 22.436s (43.5MB/s)
1 Worker - 2nd test 6.588s(148.2MB/s) 7.542s (129.5MB/s) 21.901s (44.6MB/s) 22.001s (44.4MB/s) 23.378s (41.8MB/s)
Sequencial Write : 64KB x 15,625 times = 1GB
# time dd if=/dev/zero of=/mnt/unify/file bs=65536 count=15625
Prishtina 29-30.Aug.2009
Sources of Information
Project's site:http://www.gluster.com
Official GlusterFS documentation wiki:http://www.gluster.org/docs/index.php/GlusterFS
On IRC:irc.freenode.net #gluster
The mailing list:[email protected]
28/29
Prishtina 29-30.Aug.2009
Clusters with GlusterFS
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Questions ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?