THE FUTURE OF STORAGE
Traditional StorageComplex proprietary silos
Open Software Defined StorageStandardized, unified, open platforms
USER
ADMIN
USER
ADMIN
Custom GUI
Proprietary Software
Custom GUI
Proprietary Software
ProprietaryHardware
ProprietaryHardware
StandardComputersand Disks
Com
modit
yH
ard
ware
Open S
ourc
eSoft
ware
Ceph
Control Plane (API, GUI)
ADMIN USER
THE JOURNEY
Open Software-Defined Storage is a fundamental reimagining of how storage infrastructure works.
It provides substantial economic and operational advantages, and it has quickly become ideally suited for a growing number of use cases.
TODAY EMERGING FUTURE
CloudInfrastructure
CloudNative Apps
Analytics
Hyper-Convergence
Containers
???
???
HISTORICAL TIMELINE
RHEL-OSP Certification FEB 2014
MAY 2012Launch of Inktank
OpenStack Integration 2011
2010Mainline Linux Kernel
Open Source 2006
2004 Project Starts at UCSC
Production Ready Ceph SEPT 2012
2012CloudStack Integration
OCT 2013Inktank Ceph Enterprise Launch
Xen Integration 2013
APR 2014Inktank Acquired by Red Hat
10 years in the making
5
ARCHITECTURAL COMPONENTS
7
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and scale-
out metadata management
APP HOST/VM CLIENT
RADOS COMPONENTS
10
OSDs: 10s to 10000s in a cluster One per disk (or one per SSD, RAID group…) Serve stored objects to clients Intelligently peer for replication & recovery
Monitors: Maintain cluster membership and state Provide consensus for distributed decision-
making Small, odd number These do not serve stored objects to clients
M
EVEN BETTER: CRUSH!
14
CLUSTER
OBJECTS
10
01
01
10
10
01
11
01
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
PLACEMENT GROUPS(PGs)
CRUSH: DYNAMIC DATA PLACEMENT
16
CRUSH: Pseudo-random placement algorithm
Fast calculation, no lookup Repeatable, deterministic
Statistically uniform distribution Stable mapping
Limited data migration on change Rule-based configuration
Infrastructure topology aware Adjustable replication Weighting
CRUSH
OBJECT
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
CRUSH(pg, cluster state, rule set)
ARCHITECTURAL COMPONENTS
26
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and scale-
out metadata management
APP HOST/VM CLIENT
L
LIBRADOS: RADOS ACCESS FOR APPS
28
LIBRADOS: Direct access to RADOS for applications C, C++, Python, PHP, Java, Erlang Direct access to storage nodes No HTTP overhead
ARCHITECTURAL COMPONENTS
29
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and scale-
out metadata management
APP HOST/VM CLIENT
THE RADOS GATEWAY
30
M M
M
RADOS CLUSTER
RADOSGW
LIBRADOS
socket
RADOSGW
LIBRADOS
APPLICATION APPLICATION
REST
RADOSGW MAKES RADOS WEBBY
31
RADOSGW: REST-based object storage proxy Uses RADOS to store objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications
ARCHITECTURAL COMPONENTS
32
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and scale-
out metadata management
APP HOST/VM CLIENT
RBD STORES VIRTUAL DISKS
RADOS BLOCK DEVICE: Storage of disk images in RADOS Decouples VMs from host Images are striped across the cluster (pool) Snapshots Copy-on-write clones Support in:
Mainline Linux Kernel (2.6.39+) and RHEL 7 Qemu/KVM, native Xen coming soon OpenStack, CloudStack, Nebula, Proxmox
Export snapshots to geographically dispersed data centers▪ Institute disaster recovery
Export incremental snapshots▪ Minimize network bandwidth by only sending changes
RBD SNAPSHOTS
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and scale-
out metadata management
APP HOST/VM CLIENT
SCALABLE METADATA SERVERS
METADATA SERVER Manages metadata for a POSIX-compliant
shared filesystem Directory hierarchy File metadata (owner, timestamps, mode,
etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem
CALAMARI ARCHITECTURE
CEPH STORAGE CLUSTER
MASTER
CALAMARI
ADMIN NODE
MINION MINION
M
MINION MINION
M
MINIONMINION
M
WEB APPLICATION STORAGE
WEB APPLICATION
APP SERVER APP SERVER APP SERVER
CEPH STORAGE CLUSTER(RADOS)
CEPH OBJECT GATEWAY
(RGW)
CEPH OBJECT GATEWAY(RGW)
APP SERVER
S3/Swift S3/Swift S3/Swift S3/Swift
MULTI-SITE OBJECT STORAGE
WEB APPLICATION
APP SERVER
CEPH OBJECT GATEWAY
(RGW)
CEPH STORAGE CLUSTER
(US-EAST)
WEB APPLICATION
APP SERVER
CEPH OBJECT GATEWAY
(RGW)
CEPH STORAGE CLUSTER
(EU-WEST)
ARCHIVE / COLD STORAGE
APPLICATION
CACHE POOL (REPLICATED)
BACKING POOL (ERASURE CODED)
CEPH STORAGE CLUSTER
ERASURE CODING
47
OBJECT
REPLICATED POOL
CEPH STORAGE CLUSTER
ERASURE CODED POOL
CEPH STORAGE CLUSTER
COPY COPY
OBJECT
31 2 X Y
COPY
4
Full copies of stored objects Very high durability Quicker recovery
One copy plus parity Cost-effective durability Expensive recovery
ERASURE CODING: HOW DOES IT WORK?
48
CEPH STORAGE CLUSTER
OBJECT
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CACHE TIERING
49
CEPH CLIENT
CACHE: WRITEBACK MODE
BACKING POOL (REPLICATED)
CEPH STORAGE CLUSTER
Read/Write Read/Write
WEBSCALE APPLICATIONS
50
WEB APPLICATION
APP SERVER APP SERVER APP SERVER
CEPH STORAGE CLUSTER(RADOS)
APP SERVER
NativeProtocol
NativeProtocol
NativeProtocol
NativeProtocol
ARCHIVE / COLD STORAGE
51
APPLICATION
CACHE POOL (REPLICATED)
BACKING POOL (ERASURE CODED)
CEPH STORAGE CLUSTER
CEPH BLOCK DEVICE (RBD)
DATABASES
52
MYSQL / MARIADB
LINUX KERNEL
CEPH STORAGE CLUSTER(RADOS)
NativeProtocol
NativeProtocol
NativeProtocol
NativeProtocol
CEPH ROADMAP
57
Hammer(current release) Infernalis J-Release
NewStore
Object Expiration
Performance Improvements
Stable CephFS?Object Versioning
Alternative Web Server for RGW
Performance Improvements
???
Performance Improvements
NEXT STEPSWHAT NOW?
• Read about the latest version of Ceph: http://ceph.com/docs
• Deploy a test cluster using ceph-deploy: http://ceph.com/qsg
Getting Started with Ceph
Most discussion happens on the mailing lists ceph-devel and ceph-users. Join or view archives at http://ceph.com/list
IRC is a great place to get help (or help others!) #ceph and #ceph-devel. Details and logs at http://ceph.com/irc
Getting Involved with Ceph
59
• Deploy a test cluster on the AWS free-tier using Juju: http://ceph.com/juju
• Ansible playbooks for Ceph: https://www.github.com/alfredodeza/ceph-ansible
Download the code: http://www.github.com/ceph
The tracker manages bugs and feature requests. Register and start looking around at http://tracker.ceph.com
Doc updates and suggestions are always welcome. Learn how to contribute docs at http://ceph.com/docwriting