Clusters with GlusterFS

Embed Size (px)

Citation preview

Cluster Filesystems

Marian Marinov - [email protected] Architect - Siteground.com

Kosova Sofware Freedom Conference 2009

Clusters with GlusterFS

Prishtina 29-30.Aug.2009

Prishtina 29-30.Aug.2009

Agenda

Cluster Filesystems

Some facts

Gluster Design

kernel

gluster engine protocols

translators storage

performance

others

schedulers

Some benchmarks

1/29

Prishtina 29-30.Aug.2009

Cluster Filesystems

Prishtina 29-30.Aug.2009

Cluster Filesystems

3/29

Prishtina 29-30.Aug.2009

Cluster Filesystems

Prishtina 29-30.Aug.2009

GFarm Desgin

6/29

Prishtina 29-30.Aug.2009

Facts

GlusterFS project starts in August 2006

It is not actual Filesystem

Server only for any POSIX compliant but mainly tested on Linux

Client running on Linux, FreeBSD & MacOS Xas they require FUSE

Very scallable

Very easy to install and maintain

4/29

Prishtina 29-30.Aug.2009

GlusterFS Desgin

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

In the kernel

Requires FUSE

FUSE as module

GlusterFUSE

The engine

Server & Client

Transport Modules

Translators

Scheduler Modules

Prishtina 29-30.Aug.2009

GlusterFS Desgin

Prishtina 29-30.Aug.2009

GlusterFS Desgin

Prishtina 29-30.Aug.2009

GlusterFS Desgin

Prishtina 29-30.Aug.2009

GlusterFS Desgin

Prishtina 29-30.Aug.2009

GlusterFS Desgin

The picture explained:

ClientX:

volume serverX- defines a name for a remote serversubvolumes brick0- defines in which of all exported volumes from the remote server we are interested

some performance translators

volume unify- defines that we will use unify cluster translatorsubvolumes serverX serverY- defines which already connected storage volumes will be used

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Transport Modules:For TCP/IP transporttransport-type tcp/serverFor Infiniband SDP transporttransport-type ib-sdp/serverFor Infiniband Verbs transporttransport-type ib-verbs/server

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

The idea GNU/Hurd

Translators

Performance

Clustering

Features

Storage

Others

14/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Performance translators

Read Ahead

Write Behind

Threaded I/O

IO-Cache

Stat Pre-fetch still not ported to the new versions

Booster

15/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Clustering translators

Distributed Hash Table (DHT)

Stripe

Replicate (old AFR)

Unify (new HA)

NUFA

16/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Distributed Hash Table (DHT)

lookup-unhashed

min-free-disk

Replicate

read-subvolume

favorite-child

data-self-heal, metadata-self-heal & entry-self-heal

data-change-log', metadata-change-log & entry-change-log

data-lock-server-count, metadata-lock-server-count & entry-lock-server-count

Stripe & Unify

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Scheduling systems

Adaptive Least Usage (ALU)

Non-uniform filesystem architecture (NUFA)

Random

Rand-Robin

Switch

17/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Adaptive Least Usage (ALU)

disk-usage

read-usage

write-usage

open-files-usage

disk-speed-usage

18/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Non-uniform filesystem architecture (NUFA)

local-volume-name

limits.min-free-disk

Random

limits.min-free-disk

Round-Robin

limits.min-free-disk

read-only-subvolumes

refresh-interval

19/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Switch

switch.case *jpg:brick1,brick2;*mp3:brick3;*:brick4,brick5

switch.read-only-subvolumes brick7

20/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

Other translators

client

server

posix

posix-locks

bdb

filter

rot-13

trace

21/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

filter

root-squashing

read-only

fixed-uid & fixed-gid

translate-uid & translate-gid

filter-uid & filter-gid

21/29

Prishtina 29-30.Aug.2009

Gluster Filesystem Design

In the feature

Live addition/removal of nodes

Automatic File Reordering

Web GUI

mod_glusterfs

22/29

Prishtina 29-30.Aug.2009

Gluster Design

23/29

Prishtina 29-30.Aug.2009

Benchmarks

24/29

Prishtina 29-30.Aug.2009

Benchmarks

Prishtina 29-30.Aug.2009

Benchmarks

Aggregated Read Throughput Benchmark

Multiple dd utility were executed simultaneously with different block sizes to read from GlusterFS filesystem.4KB16KB128KB256KB512KB1024KBLustre 1,796 MB/s 5,782 MB/s 20,423 MB/s 21,582 MB/s 22,789 MB/s 23,731 MB/sGlusterFS 11,415 MB/s 11,424 MB/s 11,427 MB/s 11,419 MB/s 11,411 MB/s 11,409 MB/s

Aggregated Write Throughput Benchmark

Multiple dd utility were executed simultaneously with different block sizes to write to GlusterFS filesystem.4KB 16KB 128KB 256KB 512KB 1024KBLustre 969 MB/s 1,613 MB/s 1,988 MB/s 1,989 MB/s 1,984 MB/s 1,983 MB/sGlusterFS 1,886 MB/s 2,191 MB/s 2,237 MB/s 2,231 MB/s 2,236 MB/s 2,223 MB/s

Note: Higher means faster.

26/29

Prishtina 29-30.Aug.2009

Benchmarks

Apache Web Server Benchmark

Apache served 12039 files (595 MB) over HTTP protocol. wget client fetched the files recursively.TimeLustre Failed after downloading 33 MB out of 585 MB in 11 mins.GlusterFS 3 mins 11 secs

Archive Creation

'tar utility created an archive of 12039 files (595 MB) served through GlusterFS.TimeLustre 41 secsGlusterFS 25 secs

Archive Extraction

TimeLustre FAILEDNo space left on device.GlusterFS 43 secs

Note: Lower means faster.

27/29

Prishtina 29-30.Aug.2009

Benchmarks

Prishtina 29-30.Aug.2009

Benchmarks

Test Case Local SATA 500G Local RAID0 NFSSingle GlusterFS Unified GlusterFS

1 Worker - 1st test 11.836s (82.5MB/s) 11.371s (85.9MB/s) 23.162s (42.2MB/s) 2m19.597s (7.0MB/s) 3m39.279s (4.4MB/s)

1 Worker - 2nd test 10.537s (92.7MB/s) 10.777s (90.6MB/s) 24.181s (40.4MB/s) 2m24.623s (6.7MB/s) 3m40.334s (4.4MB/s)

Sequencial Write : 1KB x 1,000,000 times = 1GB

# time dd if=/dev/zero of=/mnt/unify/file bs=1024 count=1000000

Prishtina 29-30.Aug.2009

Benchmarks

Test Case Local SATA 500G Local RAID0 NFSSingle GlusterFS Unified GlusterFS

1 Worker - 1st test 6.390s(152.8MB/s) 7.939s (123.0MB/s) 22.766s (42.9MB/s) 24.637s (39.6MB/s) 22.436s (43.5MB/s)

1 Worker - 2nd test 6.588s(148.2MB/s) 7.542s (129.5MB/s) 21.901s (44.6MB/s) 22.001s (44.4MB/s) 23.378s (41.8MB/s)

Sequencial Write : 64KB x 15,625 times = 1GB

# time dd if=/dev/zero of=/mnt/unify/file bs=65536 count=15625

Prishtina 29-30.Aug.2009

Sources of Information

Project's site:http://www.gluster.com

Official GlusterFS documentation wiki:http://www.gluster.org/docs/index.php/GlusterFS

On IRC:irc.freenode.net #gluster

The mailing list:[email protected]

28/29

Prishtina 29-30.Aug.2009

Clusters with GlusterFS

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Questions ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?