1. Bipin Kunal ASME Red Hat [email protected] Geo-Replication
and Disaster Recovery : GlusterFS
2. 06/26/15 2 Agenda What is Geo-replication Various stages
Various components How it works Troubleshooting Logs Recovery How
you can contribute
3. 06/26/15 3 What is Geo-Replication Master Volume Slave
Volume Internet Asynchronously replication Geographically apart
Pune Sydney
4. 06/26/15 4 Geo-replication setup Create Create and collect
all ssh keys from Master Cluster Verification(Passwordless ssh,
Version, Size comparison) Distribute keys to Slave Cluster Create
Session Status file in Master nodes. Enables Marker and Changelog
Delete Cleaning working directory
5. 06/26/15 5 Geo-replication setup cont. Start Glusterd spawns
Monitor(gsyncd) in all Master nodes Stop Stops all gsyncd process
Status Good status Not Started Active/Passive Stopped Bad status
Faulty/Defunct/Config Corrupted
6. 06/26/15 6 Monitor One process per Geo-replication session
per node Spawns worker and agent and re spawns if dies ps -ax |
grep gsyncd | grep monitor
7. 06/26/15 7 Worker One process per brick Change detection,
Initiate Rsync/Tar, Status Update Active and Passive workers ps -ax
| grep gsyncd | grep feedback
8. 06/26/15 8 Agent Parsing Changelogs from Brick Backend using
libgfchangelog ps -ax | grep gsyncd | grep agent
9. 06/26/15 9 Change Detection Hybrid Crawl(xsync) History
Crawl Changelog Crawl Working directory: gluster volume
geo-replication :: config working_dir
10. 06/26/15 10 Changelog E
0b99ef11-4b79-4cd0-9730-b5a0e8c4a8c0 MKDIR16877 0 0
00000000-0000-0000-0000-000000000001/dir1 E
c5250af6-720e-4bfe-b938-827614304f39 CREATE33188 0 0
0b99ef11-4b79-4cd0-9730b5a0e8c4a8c0/hello.txt D
c5250af6-720e-4bfe-b938-827614304f39 M
c5250af6-720e-4bfe-b938-827614304f39 Path:
BRICK/.glusterfs/changelogs Default rollover time: 15s
libgfchangelog shared library
11. 06/26/15 11 Sync Mechanism Rsync Good for frequent
modifications, large files etc Tar+ssh Good for small files, Create
only workloads
14. 06/26/15 14 Troubleshooting If Status is Faulty, look for
any errors in Logs(Master, Slave) Skip messages in log files or
Python Tracebacks. Xattrs dump from All brick roots and from
affected files. Get GFIDs of missing files if any, and check
whether Geo-rep is processed or not. (.processed, .processing
directories in working_dir)
15. 06/26/15 15 Disaster Recovery Promote slave to master Mount
slave volume and start IO