View
1.276
Download
0
Category
Tags:
Preview:
DESCRIPTION
Ross Turk, VP, Marketing & Community, Inktank Ceph is an open source distributed object store, network block device, and file system designed for reliability, performance, and scalability. It runs on standard hardware, has no single point of failure, and is supported by the Linux kernel. It also works great with OpenStack and CloudStack. If you’ve heard of Ceph but aren’t sure where it fits into your plans, this is the talk for you. Designed for those who are new to Ceph, this talk will cover Ceph’s design principles, overall architecture, and integration with other operational systems.
Citation preview
Ceph FundamentalsRoss TurkVP Community, Inktank
2
ME ME ME ME ME ME.
Ross TurkVP Community, Inktank
ross@inktank.com@rossturk
inktank.com | ceph.com
3
Ceph Architectural OverviewAh! Finally, 32 slides in and he gets to the nerdy stuff.
4
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
5
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
6
DISK
FS
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
FS FS FSFS btrfsxfsext4
MMM
7
M
M
M
HUMAN
8
Monitors:• Maintain cluster
membership and state• Provide consensus for
distributed decision-making• Small, odd number• These do not serve stored
objects to clients
M
OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to
clients• Intelligently peer to perform
replication and recovery tasks
9
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
10
LIBRADOS
M
M
M
APP
socket
LLIBRADOS• Provides direct access to
RADOS for applications• C, C++, Python, PHP, Java,
Erlang• Direct access to storage
nodes• No HTTP overhead
12
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
13
M
M
M
LIBRADOS
RADOSGW
APP
socket
REST
14
RADOS Gateway:• REST-based object
storage proxy• Uses RADOS to store
objects• API supports buckets,
accounts• Usage accounting for
billing• Compatible with S3 and
Swift applications
15
16
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
17
M
M
M
VM
LIBRADOS
LIBRBD
HYPERVISOR
18
LIBRADOS
M
M
M
LIBRBD
HYPERVISOR
LIBRADOS
LIBRBD
HYPERVISORVM
19
LIBRADOS
M
M
M
KRBD (KERNEL MODULE)
HOST
20
RADOS Block Device:• Storage of disk images in
RADOS• Decouples VMs from host• Images are striped across
the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel
(2.6.39+)• Qemu/KVM, native Xen
coming soon• OpenStack, CloudStack,
Nebula, Proxmox
21
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS
A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP
RBD
A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS
A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW
A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
22
M
M
M
CLIENT
0110
datametadata
23
Metadata Server• Manages metadata for a
POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,
timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to
clients• Only required for shared
filesystem
24
What Makes Ceph Unique?Part one: it never, ever remembers where it puts stuff.
25
APP??
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
26How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0
27
APP
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
28Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0
29
APP
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
A-G
H-N
O-T
U-Z
F*
30I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0
31
HOW DO YOUFIND YOUR KEYS
WHEN YOUR HOUSEIS
INFINITELY BIGAND
ALWAYS CHANGING?
32The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0
33
OBJECT
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
CRUSH(pg, cluster state, rule set)
34
OBJECT
10 10 01 01 10 10 01 11 01 10
35
CRUSH• Pseudo-random placement
algorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform
distribution• Stable mapping• Limited data migration on
change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting
36
CLIENT
??
37
38
39
40
CLIENT
??
41
What Makes Ceph UniquePart two: it has smart block devices for all those impatient, selfish VMs.
42
LIBRADOS
M
M
M
VM
LIBRBD
HYPERVISOR
43
HOW DO YOUSPIN UP
THOUSANDS OF VMsINSTANTLY
ANDEFFICIENTLY?
44
144 0 0 0 0
instant copy
= 144
45
4144
CLIENT
write
write
write
= 148
write
46
4144
CLIENTread
read
read
= 148
47
What Makes Ceph Unique?Part three: it has powerful friends, ones you probably already know.
48
M
M
M
APACHE CLOUDSTACK
HYPER-VISOR
PRIMARY STORAGE POOL
SECONDARY STORAGE POOL
snapshots
templates
images
49
M
M
M
OPENSTACK
KEYSTONE API
SWIFT API CINDER API GLANCE API
NOVAAPI
HYPER-VISOR
RADOSGW
50
What Makes Ceph Unique?Part three: clustered metadata
51
M
M
M
CLIENT
0110
52
M
M
M
53
one tree
three metadata servers
??
54
55
56
57
58
DYNAMIC SUBTREE PARTITIONING
59
Questions?
Ross TurkVP Community, Inktank
ross@inktank.com@rossturk
inktank.com | ceph.com
Recommended