Upload
planet-cassandra
View
1.452
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Speaker: Matt Kennedy, Solution Architect: Big Data at Fusion.io YouTube: http://www.youtube.com/watch?v=xu_4TAQlY2U&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=21 Flash Memory technology, deployed as server-side PCIe or solid state disks (SSDs), is emerging as a critical tool for performance and efficiency in data centers of all scales. This presentation will discuss how the use of Flash impacts Cassandra deployments in terms of configuration, DRAM requirements and performance expectations. Ideas on leveraging C*'s cutting-edge data-center awareness to blend flash and disk storage nodes for cost and workload efficiency will also be shared. Flash media itself will be examined from a physical perspective to understand endurance issues. Data on write amplification under bulk-load and operational workload conditions will be presented to explain the impact to Flash of C*'s Log Structured Merge Tree architecture and the associated compactions. Finally, we will examine strategies to make Cassandra more Flash-aware using both conventional techniques as well as emerging Non-volatile memory (NVM) programming capabilities. Lessons learned from real-world customer deployments will be shared to complete this presentation.
Citation preview
Cassandra: No Moving PartsCassandra on Flash Memory
Matt Kennedy
(@mattmorefaster)
October 17, 2013
#CassandraEU — Copyright © 2013 Fusion-io, Inc. All rights reserved.
What is this talk about?
▸Efficiency• Definition:
noun 1. The state or quality of being efficient.
▸Efficient• Definition:
adjective 1. (especially of a system of machine) achieving maximum productivity with minimum wasted effort or expense
#CassandraEU2
#CassandraEU3
Flash vs Disk Cost Efficiency
▸Capacity
▸ IOPS
▸Cost per IOP
4TB3TB
150 200,000
$$$$¢¢¢¢
#CassandraEU4
What is flash?
NAND Flash Memory
#CassandraEU5
Flash is a persistent memory technology invented by Dr. Fujio Masuoka at Toshiba in 1980.
BitLine
Source Line Word Line
Control Gate
Float Gate
NPN
#CassandraEU6
Consumer Volume Drives Economics
#CassandraEU7
Flash in Servers
#CassandraEU8
Direct Cut Through Architecture
PC
Ie
DRAM
Host CPU
AppOS
LEGACY APPROACH FUSION DIRECT APPROACH
PC
IeS
AS
DRAM
Data path Controller
NAND
Host CPU
RAIDController
AppOS
Goal of every I/O operation to move data to/from DRAM and flash.
SC
SC
Super Capacitors
#CassandraEU9
#CassandraEU10
Cassandra I/O - Writes
http://www.datastax.com/docs/1.2/dml/about_writes
#CassandraEU11
Cassandra I/O - Reads
http://www.datastax.com/docs/1.2/dml/about_reads
#CassandraEU12
DRAM Dictates Cassandra Scaling
▸Key Design Principle:
▸Working Set < DRAM
#CassandraEU13
DO
LL
AR
SCost of DRAM Modules
4 G B 8 G B 1 6 G B 3 2 G B0
200
400
600
800
1000
1200
1400
1600
$ $$$$$
$$$$$$
#CassandraEU14
When do we scale out?
▸A typical server…
CPU Cores: 32 with HTMemory: 128 GB
…is your working set > 128GB?
#CassandraEU15
Is there a better way?
15
▸With NoSQL Databases, we tend to scale out for DRAM
Combined ResourcesCPU Cores: 192Memory: 768 GB
• Low CPU utilization
• High Utility cost
#CassandraEU16
Flash Offers A New Architectural Choice
Milliseconds 10-3 Microseconds 10-6
Nanoseconds 10-9
CPU Cache DRAM
Disk Drives
Server-based Flash
#CassandraEU17
How can we useflash in Cassandra?
18
Four Deployment Options
1. All Flash
2. Data Placement (CASSANDRA-2749)
3. Use Logical Data Centers
4. Cache Layer
#CassandraEU
19
Cassandra with All-Flash Storage
#CassandraEU
Step 1: Mount ioMemory at /var/lib/cassandraStep 2:
20
Data Placement
▸ https://issues.apache.org/jira/browse/CASSANDRA-2749• Thanks Marcus!
▸Takes advantage of filesystem hierarchy
▸Use mount points to pin Keyspaces or Column Families to flash:• /var/lib/cassandra/data/{Keyspace}/{CF}
▸Use flash for high performance needs, disk for capacity needs
#CassandraEU
21
Data Centers for Storage Control
DC1(Interactive requests)
DC3(High density replicas)
DC2(Hadoop MR Jobs)
PERFORMANCE
CAPACITY/NODE
HIGH
MEDIUM
LOW
HIGH
Cassandra cluster
#CassandraEU
Flash Caching
▸Use Flash to cache blocks from spinning disk• Larger cheaper caches than DRAM• Helps stabilize performance during compaction
▸Open-Source & Commercial options:• Flashcache: FB developed write-through/back/around cache▸ Kernel patch▸ https://github.com/facebook/flashcache/
• bcache: write-through/back/around cache▸ Kernel patch▸ http://bcache.evilpiepirate.org/
• Fusion ioTurbine: write-through, commercially supported
#CassandraEU22
23 #CassandraEU
The Numbers
24
YCSB Testing Setup
#CassandraEU
x4x1
YCSB Load Generator
10GB 16-cores24GB DRAM
Workloads use uniformrandom key selectioninstead of Zipfian.
150 million 1KB records, RF=3: ~ 120GB SSTables/node
25
50/50 R/W Uniform distribution 10hrs
#CassandraEU
YC
SB
MIX
ED
OP
S/S
EC
40
68
01
32
01
96
02
60
03
24
03
88
04
52
05
16
05
80
06
44
07
08
07
72
08
36
09
00
09
64
01
02
80
10
92
01
15
60
12
20
01
28
40
13
48
01
41
20
14
76
01
54
00
16
04
01
66
80
17
32
01
79
60
18
60
01
92
40
19
88
02
05
20
21
16
02
18
00
22
44
02
30
80
23
72
02
43
60
25
00
02
56
40
26
28
02
69
20
27
56
02
82
01
28
84
12
94
81
30
12
13
07
61
31
40
13
20
41
32
68
13
33
21
33
96
13
46
01
35
24
13
58
81
0
10000
20000
30000
40000
50000
60000
70000
mixed ops/sec
Update LatencyAverage: 511 µs95th Pctl:1 ms99th Pctl: 2 ms
Read LatencyAverage: 7.0 ms95th Pctl: 18 ms99th Pctl: 42 ms
26
95/5 R/W Uniform distribution
#CassandraEU
MIX
ED
OP
S/S
EC
10
30
50
70
90
11
0
13
0
15
0
17
0
19
0
21
0
23
0
25
0
27
0
29
0
31
0
33
0
35
0
37
0
39
0
41
0
43
0
45
0
47
0
49
0
51
0
53
0
55
0
57
0
59
0
61
0
63
0
65
0
67
0
69
0
0
10000
20000
30000
40000
50000
60000
70000
80000
75 threads 200 threads 300 threads
# threads Avg Lat. 95th pctl 99th pctl
75 1.4/0.22 ms
2/0 ms 5/0 ms
200 3.1/0.19 ms
7/0 ms 13/0 ms
300 4.4/2.2 ms 11/0 ms 19/0 ms
#CassandraEU27
Consolidation
#CassandraEU28
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
#CassandraEU29
Real-World Cassandra on Fusion
• 3-4x consolidation factor• 3-6x reduction in latency• 2.2x ROI
#CassandraEU30
x4
x4
Efficiency: Performance or Consolidation?
x4
x4
x4
x4
vs.
Cassandra @ ~100,000 ops/sec (mixed workload)
Memory/DiskioMemory
x4
x4
x4
x4
http://www.fusionio.com/white-papers/accelerate-cassandra-without-the-cluster-crawl/
Thank You
f u s i o n i o . c o m | S A M E P L A N E T. D I F F E R E N T W O R L D .
@mattmorefaster
April 11, 2023
#Cassandra1332
Cassandra: ioDrive2 vs 10 disk RAID-0
#Cassandra1333
50/50 R/W Uniform distribution
April 11, 2023
YC
SB
MIX
ED
OP
S/S
EC
10
20
30
40
50
60
70
80
90
10
01
10
12
01
30
14
01
50
16
01
70
18
01
90
20
02
10
22
02
30
24
02
50
26
02
70
28
02
90
30
03
10
32
03
30
34
03
50
36
03
70
38
03
90
40
04
10
42
04
30
44
04
50
46
04
70
48
04
90
50
05
10
52
05
30
54
05
50
0
20000
40000
60000
80000
100000
120000
mixed ops/sec
Update LatencyAverage: 311 µs95th Pctl:0 ms99th Pctl: 1 ms
Read LatencyAverage: 8.2 ms95th Pctl: 20 ms99th Pctl: 62 ms
34
YCSB: Bulk Load (CL=ALL)
#CassandraEU
YC
SB
IN
SE
RT
S
1 0 1 5 0 2 9 0 4 3 0 5 7 0 7 1 0 8 5 0 9 9 0 1 1 3 0 1 2 7 0 1 4 1 0 1 5 5 0 1 6 9 0 1 8 3 0 1 9 7 0 2 1 1 0 2 2 5 0 2 3 9 0 2 5 3 0 2 6 7 0 2 8 1 00
10000
20000
30000
40000
50000
60000
70000
inserts/sec
Avg Latency: 0.9 ms95th Percentile: 1 ms99th Percentile: 4 ms