34
Red Hat K.K. All rights reserved. GlusterFS / CTDB Integration v1.0 2013.05.14 Etsuji Nakai Senior Solution Architect Red Hat K.K.

GlusterFS CTDB Integration

Embed Size (px)

Citation preview

Page 1: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

GlusterFS / CTDB Integration

v1.0 2013.05.14Etsuji Nakai

Senior Solution ArchitectRed Hat K.K.

Page 2: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 2

$ who am i

Etsuji Nakai (@enakai00)

● Senior solution architect and cloud evangelist at Red Hat K.K.

● The author of “Professional Linux Systems” series.● Available in Japanese. Translation offering from

publishers are welcomed ;-)

Professional Linux SystemsTechnology for Next Decade

Professional Linux SystemsDeployment and Management

Professional Linux SystemsNetwork Management

Page 3: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 3

Contents

CTDB Overview

Why does CTDB matter?

CTDB split-brain resolution

Configuration steps for demo set-up

Summary

Page 4: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 4

Disclaimer

This document explains how to setup clustered Samba server using GlusterFS and CTDB with the following software components.

● Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone)

● GlusterFS: GlusterFS 3.3.1 (Community version)

● http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/

Since this is based on the community version of GlusterFS, you cannot receive a commercial support from Red Hat for this configuration. If you need a commercial support, please consider using Red Hat Storage Server(RHS). In addition, there are different conditions for a supportable configuration with RHS. Please consult sales representatives from Red Hat for details.

Red Hat accepts no liability for the content of this document, or for the consequences of any actions taken on the basis of the information provided. Any views or opinions presented in this document are solely those of the author and do not necessarily represent those of Red Hat.

Page 5: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

CTDB Overview

Page 6: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 6

What's CTDB?

TDB = Trivial Database

● Simple backend DB for Samba, used to store user info, file lock info, etc...

CTDB = Clustered TDB

● Cluster extension of TDB, necessary for multiple Samba hosts configuration to provide the same filesystem contents.

All clients see the same contentsthrough different Samba hosts.

Samba Samba Samba

・・・

Shared Filesystem

Page 7: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 7

What's wrong without CTDB?

Windows file locks are not shared among Samba hosts.

● You would see the following alert when someone is opening the same file.

● Without CTDB, if others are opening the same file through a different Samba host from you, you never see that alert.

● This is because file lock info is stored in the local TDB if you don't use CTDB.

● CTDB was initially developed as a shared TDB for multiple Samba hosts to overcome this problem.

xxx.xlsWindows file locksare not shared.

Locked! Locked!

Page 8: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 8

CTDB interconnect(heartbeat) network

Yet another benefit of CTDB

Floating IP's can be assigned across hosts for the transparent failover.

● When one of the hosts fails, the floating IP is moved to another host.

● Mutual health checking is done through the CTDB interconnect (so called “heartbeat”) network.

● CTDB can also be used for NFS server cluster to provide the floating IP feature. (CTDB doesn't provide shared file locking for NFS though.)

Floating IP#1

・・・

Floating IP#2 Floating IP#N

Floating IP#1

・・・

Floating IP#2 Floating IP#N

Floating IP#1

Page 9: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

Why does CTDB matter?

Page 10: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 10

Access path of GlusterFS native client

The native client directly communicates to all storage nodes.

● Transparent failover is implemented on the client side. When the client detects the node failure, it accesses the replicated node.

● Floating IP is unnecessary by design for the native client.

file01 file02 file03

・・・

GlusterFS Storage Nodes

file01, file02, file03

GlusterFSNative Client

GlusterFS Volume

Native client sees the volume as a single filesystem

The real locations of files are calculated on the client side.

Page 11: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 11

CIFS/NFS usecase for GlusterFS

The downside of the native client is it's not available for Unix/Windows.

● You need to rely on CIFS/NFS for Unix/Windows clients.

● In that case, windows file lock sharing and floating IP feature are not in GlusterFS. It should be provided with an external tool.

CTDB is the tool for it ;-)

・・・

CIFS/NFS Client

CIFS/NFS client connects to just one specified node.

GlusterFS storage node acts as a proxy “client”.

Different clients can connect to different nodes.DNS round-robin may work for it.

Page 12: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 12

Network topology overview without CTDB

Storage Nodes

CIFS/NFS Clients

GlusterFS interconnect

CIFS/NFS Access segment

...

If you don't need the floating IP/Windows file lock, you can go without CTDB.

● NFS file lock sharing (DNLM) is provided by GlusterFS's internal NFS server.

Although it's not mandatory, you can separate CIFS/NFS access segment from the GlusterFS interconnect for the sake of network performance.

Samba Samba Samba Samba

glusterd glusterd glusterd glusterd

Page 13: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 13

Network topology overview with CTDB

Storage Nodes

CIFS/NFS Clients

GlusterFS interconnect

CIFS/NFS access segment

...

If you use CTDB with GlusterFS, you need to add an independent CTDB interconnect (heartbeat) segment for the reliable cluster.

● The reason will be explained later.

CTDB interconnect(Heartbeat)

Page 14: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 14

Demo - Seeing is believing!

http://www.youtube.com/watch?v=kr8ylOBCn8o

Page 15: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

CTDB split-brain resolution

Page 16: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 16

What's CTDB split-brain?

When heartbeat is cut-off from any reason (possibly network problem) while cluster nodes are still running, there must be some mechanism to choose which "island" should survive and keep running.

● Without this mechanism, the same floating IP's are assigned on both islands. This is not specific to CTDB, every cluster system in the world needs to take care of the “split-brain”.

In the case of CTDB, a master node is elected though the "lock file" on the shared filesystem. An island with the master node survives. Especially, in the case of GlusterFS, the lock file is stored on the dedicated GlusterFS volume, called "lock volume".

● The lock volume is locally mounted on each storage node. If you share the CTDB interconnect with GlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut-off, resulting in an unpredictable condition.

Storage Nodes

GlusterFS interconnect

CTDB interconnect(Heartbeat)

Lock Volume

Master

The master takes an exclusive lock on the lock file.

Page 17: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 17

Typical volume config seen from storage node

# dfFilesystem           1K­blocks      Used Available Use% Mounted on/dev/vda3              2591328   1036844   1422852  43% /tmpfs                   510288         0    510288   0% /dev/shm/dev/vda1               495844     33450    436794   8% /boot/dev/mapper/vg_bricks­lv_lock                         60736      3556     57180   6% /bricks/lock/dev/mapper/vg_bricks­lv_brick01                       1038336     33040   1005296   4% /bricks/brick01localhost:/lockvol      121472      7168    114304   6% /gluster/locklocalhost:/vol01       2076672     66176   2010496   4% /gluster/vol01

# ls ­l /gluster/lock/total 2­rw­r­­r­­. 1 root root 294 Apr 26 15:43 ctdb­rw­­­­­­­. 1 root root   0 Apr 26 15:57 lockfile­rw­r­­r­­. 1 root root  52 Apr 26 15:56 nodes­rw­r­­r­­. 1 root root  96 Apr 26 15:04 public_addresses­rw­r­­r­­. 1 root root 218 Apr 26 16:31 smb.conf

Locally mounted lock volume.

Locally mounted data volume, exported with Samba.

Lock file to elect the master.

Common config files can be placed on the lock volume.

Page 18: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 18

What about sharing CTDB interconnect with the access segment?

No, it doesn't work.

When NIC for the access segment fails, the cluster detects the heartbeat failure and elects a master node through the lock file on the shared volume. However if the NIC failed node has the lock, it becomes the master although it doesn't serve to clients.

● In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDB UNHEALTHY" status, too.

Page 19: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 19

CTDB event monitoring

CTDB provides a custom event monitoring mechanism which can be used to monitor application status, NIC status, etc...

● Monitoring scripts are stored in /etc/ctdb/events.d/

● They need to implement handlers to pre-defined events.● They are called in the order of file name when some event occurs.

● Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of some script exits with non-zero return code, the node becomes "UNHEALTHY", and will be rejected from the cluster.

● For example, “10.interface” checks the link status of NIC on which floating IP is assigned.

● See README for details - http://bit.ly/14KOjlC

# ls /etc/ctdb/events.d/00.ctdb       11.natgw           20.multipathd  41.httpd  61.nfstickle01.reclock    11.routing         31.clamd       50.samba  70.iscsi10.interface  13.per_ip_routing  40.vsftpd      60.nfs    91.lvs

Page 20: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

Configuration steps for demo set-up

Page 21: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 21

Step1 – Install RHEL6.4

Install RHEL6.4 on storage nodes.

● Scalable File System Add-On is required for XFS.

● Resilient Storage Add-On is required for CTDB packages.

Configure public key ssh authentication between nodes.

● This is for an administrative purpose.

Configure network interfaces as in the configuration pages.

192.168.122.11  gluster01192.168.122.12  gluster02192.168.122.13  gluster03192.168.122.14  gluster04

192.168.2.11    gluster01c192.168.2.12    gluster02c192.168.2.13    gluster03c192.168.2.14    gluster04c

192.168.1.11    gluster01g192.168.1.12    gluster02g192.168.1.13    gluster03g192.168.1.14    gluster04g

/etc/hosts

NFS/CIFS Access Segment

CTDB Interconnect

GlusterFS Interconnect

Page 22: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 22

Step1 – Install RHEL6.4

Configure iptables on all nodes

*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]­A INPUT ­m state ­­state ESTABLISHED,RELATED ­j ACCEPT­A INPUT ­p icmp ­j ACCEPT­A INPUT ­i lo ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 22 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 111 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 139 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 445 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 24007:24050 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 38465:38468 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 4379 ­j ACCEPT­A INPUT ­j REJECT ­­reject­with icmp­host­prohibited­A FORWARD ­j REJECT ­­reject­with icmp­host­prohibitedCOMMIT

/etc/sysconfig/iptables

# vi /etc/sysconfig/iptables# service iptables restart

CTDB

CIFSportmap

NFS/NLMBricksCIFS

Page 23: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 23

Step2 – Prepare bricks

Create and mount brick directories on all nodes.

# pvcreate /dev/vdb# vgcreate vg_bricks /dev/vdb# lvcreate ­n lv_lock ­L 64M vg_bricks# lvcreate ­n lv_brick01 ­L 1G vg_bricks

# yum install ­y xfsprogs# mkfs.xfs ­i size=512 /dev/vg_bricks/lv_lock # vi mkfs.xfs ­i size=512 /dev/vg_bricks/lv_brick01

# echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab# echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab# mkdir ­p /bricks/lock# mkdir ­p /bricks/brick01# mount /bricks/lock# mount /bricksr/brick01

/dev/vdb

lv_lock

lv_brick01

vg_bricks

Mount on /bricks/lock, used for lock volume.

Mount on /bricks/brick01, used for data volume.

Page 24: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 24

Step3 – Install GlusterFS and create volumes

Install GlusterFS packages on all nodes# wget ­O /etc/yum.repos.d/glusterfs­epel.repo \  http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/RHEL/glusterfs­epel.repo# yum install ­y rpcbind glusterfs­server# chkconfig rpcbind on# service rpcbind start# service glusterd start

# gluster peer probe gluster02g# gluster peer probe gluster03g# gluster peer probe gluster04g

# gluster vol create lockvol replica 2 \    gluster01g:/bricks/lock gluster02g:/bricks/lock \    gluster03g:/bricks/lock gluster04g:/bricks/lock# gluster vol start lockvol

# gluster vol create vol01 replica 2 \    gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 \    gluster03g:/bricks/brick01 gluster04g:/bricks/brick01# gluster vol start vol01

Do not auto start glusterdwith chkconfig.

Need to specifyGlusterFS interconnect NICs.

Configure cluster and create volumes from gluster01

Page 25: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 25

Step4 – Install and configure Samba/CTDB

● Create the following config files on the shared volume.

# yum install ­y samba samba­client ctdb

# mkdir ­p /gluster/lock# mount ­t glusterfs localhost:/lockvol /gluster/lock

Do not auto start smband ctdb with chkconfig.

CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addressesCTDB_NODES=/etc/ctdb/nodes# Only when using Samba. Unnecessary for NFS.CTDB_MANAGES_SAMBA=yes# some tunablesCTDB_SET_DeterministicIPs=1CTDB_SET_RecoveryBanPeriod=120CTDB_SET_KeepaliveInterval=5CTDB_SET_KeepaliveLimit=5CTDB_SET_MonitorInterval=15

/gluster/lock/ctdb

# yum install ­y rpcbind nfs­utils# chkconfig rpcbind on# service rpcbind start

Install Samba/CTDB packages on all nodes

If you use NFS, install the following packages, too.

Configure CTDB and Samba only on gluster01

Page 26: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 26

Step4 – Install and configure Samba/CTDB

192.168.2.11192.168.2.12192.168.2.13192.168.2.14

/gluster/lock/nodes

192.168.122.201/24 eth0192.168.122.202/24 eth0192.168.122.203/24 eth0192.168.122.204/24 eth0

/gluster/lock/public_addresses

[global]workgroup = MYGROUPserver string = Samba Server Version %vclustering = yessecurity = userpassdb backend = tdbsam

[share]comment = Shared Directoriespath = /gluster/vol01browseable = yeswritable = yes

/gluster/lock/smb.conf

CTDB cluster nodes. Need to specify CTDB interconnect NICs.

Floating IP list.

Samba config.Need to specify “clustering = yes”

Page 27: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 27

Step4 – Install and configure Samba/CTDB

Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location.

● We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS.

● https://bugzilla.redhat.com/show_bug.cgi?id=910380

# mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig# mv /etc/samba/smb.conf /etc/samba/smb.conf.orig# ln ­s /gluster/lock/ctdb /etc/sysconfig/ctdb# ln ­s /gluster/lock/nodes /etc/ctdb/nodes# ln ­s /gluster/lock/public_addresses /etc/ctdb/public_addresses# ln ­s /gluster/lock/smb.conf /etc/samba/smb.conf

# yum install ­y policycoreutils­python# semanage permissive ­a smbd_t

Create symlink to config files on all nodes.

Page 28: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 28

Step4 – Install and configure Samba/CTDB

Create the following script for start/stop services

#!/bin/sh

function runcmd {        echo exec on all nodes: $@        ssh gluster01 $@ &        ssh gluster02 $@ &        ssh gluster03 $@ &        ssh gluster04 $@ &        wait}

case $1 in    start)        runcmd service glusterd start        sleep 1        runcmd mkdir ­p /gluster/lock        runcmd mount ­t glusterfs localhost:/lockvol /gluster/lock        runcmd mkdir ­p /gluster/vol01        runcmd mount ­t glusterfs localhost:/vol01 /gluster/vol01        runcmd service ctdb start        ;;

    stop)        runcmd service ctdb stop        runcmd umount /gluster/lock        runcmd umount /gluster/vol01        runcmd service glusterd stop        Runcmd pkill glusterfs        ;;esac

ctdb_manage.sh

Page 29: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 29

Step5 – Start services

Now you can start/stop services.

● After a few moments, ctdb status becomes “OK” for all nodes.

● And floating IP's are configured on each node.

# ./ctdb_manage.sh start

# ctdb statusNumber of nodes:4pnn:0 192.168.2.11     OK (THIS NODE)pnn:1 192.168.2.12     OKpnn:2 192.168.2.13     OKpnn:3 192.168.2.14     OKGeneration:1489978381Size:4hash:0 lmaster:0hash:1 lmaster:1hash:2 lmaster:2hash:3 lmaster:3Recovery mode:NORMAL (0)Recovery master:1

# ctdb ipPublic IPs on node 0192.168.122.201 node[3] active[] available[eth0] configured[eth0]192.168.122.202 node[2] active[] available[eth0] configured[eth0]192.168.122.203 node[1] active[] available[eth0] configured[eth0]192.168.122.204 node[0] active[eth0] available[eth0] configured[eth0]

Page 30: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 30

Step5 – Start services

Set samba password and check shared directories via one of floating IP's.

# pdbedit ­a ­u rootnew password:retype new password:

# smbclient ­L 192.168.122.201 ­U rootEnter root's password: Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]

Sharename       Type      Comment­­­­­­­­­       ­­­­      ­­­­­­­share           Disk      Shared DirectoriesIPC$            IPC       IPC Service (Samba Server Version 3.6.9­151.el6)

Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]

Server               Comment­­­­­­­­­            ­­­­­­­

Workgroup            Master­­­­­­­­­            ­­­­­­­

Password DB is sharedby all hosts in the cluster.

Page 31: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 31

Configuration hints

To specify the GlusterFS interconnect segment, "gluster peer probe" should be done for the IP addresses on that segment.

To specify the CTDB interconnect segment, IP addresses on that segment should be specified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").

To specify the NFS/CIFS access segment, NIC names on that segment should be specified in "/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associated with floating IP's.

To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc-auth-reject” volume options. (reject supersedes allow.)

The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") may be useful for adjusting the CTDB failover timings. See the ctdbd man page for details.

● CTDB_SET_DeterministicIPs=1

● CTDB_SET_RecoveryBanPeriod=300

● CTDB_SET_KeepaliveInterval=5

● CTDB_SET_KeepaliveLimit=5

● CTDB_SET_MonitorInterval=15

Page 32: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

Summary

Page 33: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved. 33

Summary

CTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.

Network design is crucial to realize the reliable cluster, not only for CTDB but also for every cluster in the world ;-)

Enjoy!

And one important fine print....

● Samba is not well tested on the large scale GlusterFS cluster. The use of CIFS as a primary access protocol on Red Hat Storage Server 2.0 is not officially supported by Red Hat. This will be improved in the future versions.

Page 34: GlusterFS CTDB Integration

Red Hat K.K. All rights reserved.

WE CAN DO MOREWHEN WE WORK TOGETHER

THE OPEN SOURCE WAY