27
CEPHALOPODS AND SAMBA IRA COOPER – SNIA SDC 2016.09.18

CEPHALOPODS AND SAMBA

Embed Size (px)

Citation preview

Page 1: CEPHALOPODS AND SAMBA

CEPHALOPODS AND SAMBA

IRA COOPER – SNIA SDC 2016.09.18

Page 2: CEPHALOPODS AND SAMBA

2

DISCLAIMER

● These opinions are my opinions.

● They do not represent promises from:

– Red Hat Inc.

– Samba Team

– Me

– My Mom

Page 3: CEPHALOPODS AND SAMBA

3

AGENDA

● CEPH Architecture.

– Why CEPH?

– RADOS

– RGW

– CEPHFS

● Current Samba integration with CEPH.

● Future directions.

● Maybe a demo?

Page 4: CEPHALOPODS AND SAMBA

4

CEPH MOTIVATING PRINCIPLES

● All components must scale horizontally.

● There can be no single point of failure.

● The solution must be hardware agnostic.

● Should use commodity hardware.

● Self-manage whenever possible.

● Open source.

Page 5: CEPHALOPODS AND SAMBA

5

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 6: CEPHALOPODS AND SAMBA

6

ARCHITECTURAL COMPONENTS

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

RGWA web services

gateway for object storage, compatible

with S3 and Swift

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

Page 7: CEPHALOPODS AND SAMBA

7

RADOS

● Flat object namespace within each pool

● Rich object API (librados)

– Bytes, attributes, key/value data

– Partial overwrite of existing data

– Single-object compound operations

– RADOS classes (stored procedures)

● Strong consistency (CP system)

● Infrastructure aware, dynamic topology

● Hash-based placement (CRUSH)

● Direct client to server data path

Page 8: CEPHALOPODS AND SAMBA

8

RADOS CLUSTER

APPLICATION

M M

M M

M

RADOS CLUSTER

Page 9: CEPHALOPODS AND SAMBA

9

OBJECT STORAGE DAEMONS

FS

DISK

OSD

DISK

OSD

FS

DISK

OSD

FS

DISK

OSD

FS

xfsbtrfsext4

M

M

M

Page 10: CEPHALOPODS AND SAMBA

10

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 11: CEPHALOPODS AND SAMBA

11

RADOSGW MAKES RADOS WEBBY

RADOSGW: REST-based object storage proxy Uses RADOS to store objects

● Stripes large RESTful objects across many RADOS objects

● Space efficient for small objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications

Page 12: CEPHALOPODS AND SAMBA

12

THE RADOS GATEWAY

M M

M

RADOS CLUSTER

RADOSGWLIBRADOS

socket

RADOSGWLIBRADOS

APPLICATION APPLICATION

REST

Page 13: CEPHALOPODS AND SAMBA

13

MULTI-SITE OBJECT STORAGE

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY

(RGW)

CEPH STORAGE CLUSTER

(US-EAST)

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY

(RGW)

CEPH STORAGE CLUSTER

(EU-WEST)

Page 14: CEPHALOPODS AND SAMBA

14

FEDERATED RGW

● Zones and regions

– Topologies similar to S3 and others

– Global bucket and user/account namespace

● Cross data center synchronization

– Asynchronously replicate buckets between regions

● Read affinity

– Serve local data from local DC

– Dynamic DNS to send clients to closest DC

Page 15: CEPHALOPODS AND SAMBA

15

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 16: CEPHALOPODS AND SAMBA

16

SEPARATE METADATA SERVER

LINUX HOST

M M

M

RADOS CLUSTER

KERNEL MODULE

datametadata 0110

Page 17: CEPHALOPODS AND SAMBA

17

SCALABLE METADATA SERVERS

METADATA SERVER Manages metadata for a POSIX-compliant

shared filesystem Directory hierarchy File metadata (owner, timestamps,

mode, etc.) Clients stripe file data in RADOS

MDS not in data path MDS stores metadata in RADOS

Key/value objects Dynamic cluster scales to 10s or 100s Only required for shared filesystem

Page 18: CEPHALOPODS AND SAMBA

18

METADATA SERVERS – FUTURE

METADATA SERVER Sharding of the MDS (MetaData Server)

● More scalable performance.

Active – Passive Failover● Allowing for better availability

Both features are in the codebase● In active development● Not production ready

Page 19: CEPHALOPODS AND SAMBA

SAMBA - TODAY

Page 20: CEPHALOPODS AND SAMBA

20

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VMSAMBA

CLIENT

Page 21: CEPHALOPODS AND SAMBA

21

SAMBA INTEGRATION

● vfs_ceph

– Since 2013.

– Used as the outline for vfs_glusterfs

– Been in testing in teuthology for a while now.

– Patches up to be used as a testbed for statx.

● ACL Integration?

– Patchset for POSIX ACLs committed for Samba 4.5

● Thank you to Zheng Yan

– Work on RichACLs is on going.

Page 22: CEPHALOPODS AND SAMBA

22

CTDB INTEGRATION

● fcntl locks

– Does any filesystem get this right at the start.

– 0/2 so far.

– Ceph's have been fixed, they work for CTDB.

● If you tweak the time outs.

– But these tweaks aren't production ready!

● Both kernel and FUSE clients have been tested

– CephFS team recommends ceph_fuse.

– That's what our initial integration used.

Page 23: CEPHALOPODS AND SAMBA

23

CTDB INTEGRATION

● CTDB “fcntl lock” dependency removal.

– etcd

● Battle tested.● Push other config info into etcd?

– nodes– public_addresses

● The demo will show basic etcd integration.

– Thank you to Jose Rivera for his work here.

– Zookeeper

● Much the same as etcd for this use.● Not working on it now.

Page 24: CEPHALOPODS AND SAMBA

DEMO

Page 25: CEPHALOPODS AND SAMBA

25

FUTURE DIRECTIONS – OBJECT

● RGW

– Export object data as files.

– Export files as object data?

● Not today in ceph.

– Integrate where?

● S3● RADOS● Librgw● CephFS / vfs_ceph

● S3

– Not being worked on at this time.

● Non file system based locking makes all this possible.

Page 26: CEPHALOPODS AND SAMBA

QUESTIONS?

Page 27: CEPHALOPODS AND SAMBA

THANK YOU!

Ira CooperSAMBA TEAM

[email protected]