Performance characterization in large distributed file system with gluster fs

Preview:

DESCRIPTION

GlusterFS talk at Quarterly Large Scale Production Engineering (LSPE) meet @ Yahoo! Bangalore. http://www.meetup.com/lspe-in/events/108091572/

Citation preview

  • 1. The Future of Storage is Open for Business 4 Highly scalable storage Multiple peta-byte clusters Geo-replication to disperse data Scale-up and scale-out No metadata bottlenecks uses algorithmic approach Highly cost-effective Leverages commodity x86 servers No SAN Software only Process data and analytics on storage node Highly Flexible Physical, Virtual, Cloud and Hybrid deployment models File and Object access protocols No Lock-In Deployment Agnostic Deploy on-premise, in the public cloud or a hybrid setup. Open & standards based NFS, CIFS, POSIX REST GlusterFS Architecture key value propositions

2. The Future of Storage is Open for Business 5 GlusterFS Features Unidirectional Asynchronous replication. Directory and Volume Quotas Read-only and WORM volumes. Directory Quotas. Block Device io statistics Multi-tenancy, encryption, compression - WIP. 3. The Future of Storage is Open for Business 6 Use Cases - Current Unstructured data storage Archival Disaster Recovery Virtual Machine Image Store Cloud Storage for Service Providers. Content Cloud 4. The Future of Storage is Open for Business 7 GlusterFS concepts 5. The Future of Storage is Open for Business 8 Bricks Trusted Storage Pool (cluster) is a collection of storage servers. Trusted Storage Pool is formed by invitation you probe a new member from the cluster and not vice versa. Logical partition for all data and management operations. Membership information used for determining quorum. Members can be dynamically added and removed from the pool. GlusterFS concepts Trusted Storage Pool 6. The Future of Storage is Open for Business 9 BricksGlusterFS concepts Trusted Storage Pool Node1 Node2 Probe Probe accepted Node 1 and Node 2 are peers in a trusted storage pool Node2Node1 7. The Future of Storage is Open for Business 10 BricksGlusterFS concepts Trusted Storage Pool Node1 Node2 Node3Node2Node1 Trusted Storage Pool Node3Node2Node1 Detach 8. The Future of Storage is Open for Business 11 Bricks A brick is the combination of a node and an export directory for e.g. hostname:/dir Each brick inherits limits of the underlying filesystem No limit on the number bricks per node Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks 9. The Future of Storage is Open for Business 12 BricksGlusterFS concepts - Volumes A volume is a logical collection of bricks. Volume is identified by an administrator provided name. Volume is a mountable entity and the volume name is provided at the time of mounting. mount -t glusterfs server1:/ /my/mnt/point Bricks from the same node can be part of different volumes 10. The Future of Storage is Open for Business 13 BricksGlusterFS concepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos 11. The Future of Storage is Open for Business 14 Volume Types Type of a volume is specified at the time of volume creation Volume type determines how and where data is placed Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate f) Distributed Striped Replicate 12. The Future of Storage is Open for Business 15 Distributed Volume Distributes files across various bricks of the volume. Directories are present on all bricks of the volume. Single brick failure will result in loss of data availability. Removes the need for an external meta data server. 13. The Future of Storage is Open for Business 16 How does a replicated volume work? 14. The Future of Storage is Open for Business 17 Access Mechanisms: Gluster volumes can be accessed via the following mechanisms: FUSE based Native protocol NFS SMB libgfapi ReST HDFS 15. The Future of Storage is Open for Business 18 Access Mechanisms - How does FUSE work ? 16. The Future of Storage is Open for Business 19 Access Mechanisms smbd (CIFS) VFS FUSE glusterfs client processLinuxkernel otherRHSservers... RHS server Swift NFSV3clientsWindows HTTPclients glusterfsd brick server App RHS client qemu (KVM) Hadoop 17. The Future of Storage is Open for Business 20 FUSE based native access 18. The Future of Storage is Open for Business 21 NFS 19. The Future of Storage is Open for Business 22 ReST based access 20. The Future of Storage is Open for Business 23 libgfapi Exposes APIs for accessing Gluster volumes. Reduces context switches. Qemu integrated with libgfapi. Integration of samba with libgfapi in progress. Both sync and async interfaces available. Emerging bindings for various languages. 21. The Future of Storage is Open for Business 24 smbd (CIFS) VFS FUSE glusterfs client processLinuxkernel otherRHSservers... RHS server Swift NFSV3clientsWindows HTTPclients glusterfsd brick server App qemu (KVM) Hadoop libgfapi 22. The Future of Storage is Open for Business 25 Translators in GlusterFS Building blocks for a GlusterFS process. Based on Translators in GNU HURD. Each translator is a functional unit. Translators can be stacked together for achieving desired functionality. Translators are deployment agnostic can be loaded in either the client or server stacks. 23. The Future of Storage is Open for Business 26 VFS Server I/O Cache Distribute / Stripe POSIX Ext4 Ext4Ext4 POSIX POSIX Brick 1 ServerServer G lusterFS C lient Read Ahead Brick 2 Brick n-1 Gluster Server Replicate Ext4 POSIX Server Brick n Replicate Customizable GlusterFS Client/Server Stack Client Gluster Server Client GigE, 10GigE TCPIP / InfiniBand RDMA Gluster ServerGluster Server Client Client 24. The Future of Storage is Open for Business 27 Provisioning GlusterFS design -> install -> verify -> monitor design principles: balanced hardware configuration Is the hardware sufficient ? do not under fund network, cheapest of three CPU, Storage and Network limited bricks/volume -> make bricks big file size distribution affects bottleneck type network traffic increase for non-native protocols extra replication traffic for writes install principles: provision network for your use case configure storage to be future-ready 25. The Future of Storage is Open for Business 28 Recommended storage brick configuration (With XFS) 12 drives/RAID6 LUN, 1 LUN / brick hardware RAID stripe size 256 KB (default 64 KB) pvcreate dataalignment 2560k mkfs.xfs -i size=512 -n size=8192 -d su=256k,sw=10 /dev/vg_bricks/lvN mount options: inode64,noatime 26. The Future of Storage is Open for Business 29 Deploying network if non-native protocol only, separate Gluster and non-Gluster traffic onto separate VLANs isolates self-heal and rebalance traffic separates replica traffic from user traffic jumbo frames improve throughput, but requires switch configuration bisection bandwidth Gluster doesn't respect rack boundaries 27. The Future of Storage is Open for Business 30 Capturing perf. problems onsite top utility press H to show per-thread CPU utilization, will detect hot-thread problems where thread is using up its core NFS: nfsiostat and nfsstat utilities gluster volume profile shows latency, throughput for Gluster RPC operations gluster volume top shows which files, servers are hot Wireshark Gluster plug-in isolates problems 28. The Future of Storage is Open for Business 31 Resources Mailing lists: gluster-users@gluster.org gluster-devel@nongnu.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch 29. The Future of Storage is Open for Business 32 Questions? Thank You!

Recommended