35
GlusterFS & OpenStack Vijay Bellur Red Hat

Glusterfs and openstack

Embed Size (px)

Citation preview

Page 1: Glusterfs  and openstack

GlusterFS & OpenStack

Vijay Bellur

Red Hat

Page 2: Glusterfs  and openstack

Agenda

➢ What is GlusterFS?

➢ Architecture➢ Concepts➢ Algorithms➢ Access Mechanisms➢ Implementation➢ OpenStack and GlusterFS

➢Resources

➢ Q&A

Page 3: Glusterfs  and openstack

GlusterFS Deployment

Global namespace

Scale-out storage building blocks

Supports thousands of clients

Access using GlusterFS native, NFS, SMB and HTTP protocols

Runs on commodity hardware

Page 4: Glusterfs  and openstack

GlusterFS concepts

Page 5: Glusterfs  and openstack

Trusted Storage Pool (cluster) is a collection of storage servers.

Trusted Storage Pool is formed by invitation – you “probe” a new member from the cluster and not vice versa.

Logical partition for all data and management operations.

Membership information used for determining quorum. Members can be dynamically added and removed from

the pool.

GlusterFS concepts – Trusted Storage Pool

Page 6: Glusterfs  and openstack

GlusterFS concepts – Trusted Storage Pool

Node1 Node2

Probe

Probe accepted

Node 1 and Node 2 are peers in a trusted storage pool

Node2Node1

Page 7: Glusterfs  and openstack

GlusterFS concepts – Trusted Storage Pool

Node1 Node2 Node3Node2Node1 Trusted Storage Pool

Node3Node2Node1

Detach

Page 8: Glusterfs  and openstack

A brick is the combination of a node and an export directory – for e.g. hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size

/export3 /export3 /export3

Storage Node

/export1

Storage Node

/export2

/export1

/export2

/export4

/export5

Storage Node

/export1

/export2

3 bricks 5 bricks 3 bricks

GlusterFS concepts - Bricks

Page 9: Glusterfs  and openstack

GlusterFS concepts - Volumes

➢ A volume is a logical collection of bricks.

➢ Volume is identified by an administrator provided name.

➢Volume is a mountable entity and the volume name is

provided at the time of mounting.➢ mount -t glusterfs server1:/<volname> /my/mnt/point

➢ Bricks from the same node can be part of different

volumes

Page 10: Glusterfs  and openstack

GlusterFS concepts - Volumes

Node2Node1 Node3

/export/brick1

/export/brick2

/export/brick1

/export/brick2

/export/brick1

/export/brick2 Videos

music

Page 11: Glusterfs  and openstack

Volume Types

➢Type of a volume is specified at the time of volume creation

➢ Volume type determines how and where data is placed

➢ Following volume types are supported in glusterfs:a) Distributeb) Stripec) Replicationd) Distributed Replicatee) Striped Replicatef) Distributed Striped Replicate

Page 12: Glusterfs  and openstack

Distributed Volume

➢Distributes files across various bricks of the volume.

➢Directories are present on all bricks of the volume.

➢Single brick failure will result in loss of data availability.

➢Removes the need for an external meta data server.

Page 13: Glusterfs  and openstack

How does a distributed volume work?

➢ Distribute works on the basis of a dht algorithm.

➢ A 32-bit hash space is divided into N ranges for N bricks

➢ At the time of directory creation, a range is assigned to each

directory.

➢ During a file creation or retrieval, hash is computed on the file

name. This hash value is used to locate or place the file.

➢Different directories in the same brick end up with different hash

ranges.

Page 14: Glusterfs  and openstack

How does a distributed volume work?

Page 15: Glusterfs  and openstack

How does a distributed volume work?

Page 16: Glusterfs  and openstack

How does a distributed volume work?

Page 17: Glusterfs  and openstack

Replicated Volume

➢Creates synchronous copies of all directory and file

updates.

➢Provides high availability of data when node failures

occur.

➢Transaction driven for ensuring consistency.

➢Changelogs maintained for re-conciliation.

➢Any number of replicas can be configured.

Page 18: Glusterfs  and openstack

How does a replicated volume work?

Page 19: Glusterfs  and openstack

How does a replicated volume work?

Page 20: Glusterfs  and openstack

Distributed Replicated Volume

➢ Distribute files across replicated bricks

➢ Number of bricks must be a multiple of the replica count➢ Ordering of bricks in volume definition matters

➢ Scaling and high reliability

➢ Improved read performance in most environments.

➢Most preferred model of deployment currently.

Page 21: Glusterfs  and openstack

Distributed Replicated Volume

Page 22: Glusterfs  and openstack

Striped Volume

➢Files are striped into chunks and placed in various bricks.

➢Recommended only when very large files greater than

the size of the disks are present.

➢Chunks are files with holes – this helps in maintaining

offset consistency.

➢A brick failure can result in data loss. Redundancy with

replication is highly recommended

Page 23: Glusterfs  and openstack

Translators in GlusterFS

➢Building blocks for a GlusterFS process.

➢Based on Translators in GNU HURD.

➢Each translator is a functional unit.

➢Translators can be stacked together for achieving desired functionality.

➢ Translators are deployment agnostic – can be loaded in either the client or server stacks.

Page 24: Glusterfs  and openstack

VFS

Server

I/O Cache

Distribute / Stripe

POSIX

Ext4 Ext4Ext4

POSIX POSIX

Brick 1

ServerServer

Read Ahead

Brick 2 Brick n-1

Gluster Server

Replicate

Ext4

POSIX

Server

Brick n

Replicate

Customizable GlusterFS Client/Server Stack

Client

Gluster Server

Client

GigE, 10GigE – TCPIP / InfiniBand – RDMA

Gluster ServerGluster Server

Client Client

Page 25: Glusterfs  and openstack

Dynamic Volume Management

Collectively refers to application transparent operations

that can be performed in the storage layer.

➢ Addition of Bricks to a volume

➢ Remove brick from a volume

➢ Rebalance data spread within a volume

➢ Replace a brick in a volume

➢ Adaptive Performance / Functionality tuning.

Page 26: Glusterfs  and openstack

Access Mechanisms:

Gluster volumes can be accessed via the following

mechanisms:

➢ FUSE

➢ NFS

➢ SMB

➢ libgfapi

➢ ReST

➢ HDFS

Page 27: Glusterfs  and openstack

ReST based access

Page 28: Glusterfs  and openstack

UFO

Client Proxy Account

Container

Object

HTTP Request (REST)

Directory

Volume

FileClientNFS or

GlusterFS Mount

Unified File and object view.

Entity mapping between file and object building blocks

Page 29: Glusterfs  and openstack

Interface Possibilities

qemu

NFS

SMB

Hadoop

FUSE

Cinder

Swift (UFO)

Files Blocks

Objects

libgfapi

Whatever

IP RDMA

Transports

files BD

Back ends

DB

Interface Possibilities

Page 30: Glusterfs  and openstack

OpenStack and GlusterFS – Current Integration

Glance Images

NovaNodes

SwiftObjects

Cinder Data

Glance Data

Swift Data

Swift API

Storage Server

Storage Server

Storage Server…

KVM

KVM

KVM

● Separate Compute and Storage Pools

● GlusterFS directly provides Swift object service

● Integration with Keystone● GeoReplication for multi-site

support● Swift data also available via

other protocols● Supports non-OpenStack use in

addition to OpenStack use

Logical View Physical View

Page 31: Glusterfs  and openstack

OpenStack and GlusterFS - Future Direction

HadoopGuest

OtherGuest

...

Host

GlusterGuest

HadoopGuest

OtherGuest

...

Host

GlusterGuest

NovaCompute

Nodes

Page 32: Glusterfs  and openstack

Open Stack and GlusterFS - Future Direction

● POC based on proposed OpenStack FaaS (File as a Service) proposal

● Cinder-like virtual NAS service● Tenant-specific file shares● Hypervisor mediated for security

● Avoid exposing servers to Quantum tenant network ● Optional multi-site or multi-zone GeoReplication

● FaaS data optionally available to non OpenStack nodes

● Initial focus on Linux guest

● Windows (SMB) and NFS shares also under consideration

Page 33: Glusterfs  and openstack

Open Stack and GlusterFS - Summary

● Can address storage needs across the stack:

● Block● Object● File● Hadoop/Big Data

Page 34: Glusterfs  and openstack

Resources

Mailing lists:[email protected]@nongnu.org

IRC:#gluster and #gluster-dev on freenode

Links:http://www.gluster.orghttp://hekafs.orghttp://forge.gluster.orghttp://www.gluster.org/community/documentation/index.php/Arch