C* Summit EU 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned

Cassandra: No Moving PartsCassandra on Flash Memory

Matt Kennedy

(@mattmorefaster)

October 17, 2013

#CassandraEU — Copyright © 2013 Fusion-io, Inc. All rights reserved.

What is this talk about?

▸Efficiency• Definition:

noun 1. The state or quality of being efficient.

▸Efficient• Definition:

adjective 1. (especially of a system of machine) achieving maximum productivity with minimum wasted effort or expense

#CassandraEU2

#CassandraEU3

Flash vs Disk Cost Efficiency

▸Capacity

▸ IOPS

▸Cost per IOP

4TB3TB

150 200,000

$$$$¢¢¢¢

#CassandraEU4

What is flash?

NAND Flash Memory

#CassandraEU5

Flash is a persistent memory technology invented by Dr. Fujio Masuoka at Toshiba in 1980.

BitLine

Source Line Word Line

Control Gate

Float Gate

NPN

#CassandraEU6

Consumer Volume Drives Economics

#CassandraEU7

Flash in Servers

#CassandraEU8

Direct Cut Through Architecture

PC

Ie

DRAM

Host CPU

AppOS

LEGACY APPROACH FUSION DIRECT APPROACH

PC

IeS

AS

DRAM

Data path Controller

NAND

Host CPU

RAIDController

AppOS

Goal of every I/O operation to move data to/from DRAM and flash.

SC

SC

Super Capacitors

#CassandraEU9

#CassandraEU10

Cassandra I/O - Writes

http://www.datastax.com/docs/1.2/dml/about_writes

#CassandraEU11

Cassandra I/O - Reads

http://www.datastax.com/docs/1.2/dml/about_reads

#CassandraEU12

DRAM Dictates Cassandra Scaling

▸Key Design Principle:

▸Working Set < DRAM

#CassandraEU13

DO

LL

AR

SCost of DRAM Modules

4 G B 8 G B 1 6 G B 3 2 G B0

200

400

600

800

1000

1200

1400

1600

$ $$$$$

$$$$$$

#CassandraEU14

When do we scale out?

▸A typical server…

CPU Cores: 32 with HTMemory: 128 GB

…is your working set > 128GB?

#CassandraEU15

Is there a better way?

15

▸With NoSQL Databases, we tend to scale out for DRAM

Combined ResourcesCPU Cores: 192Memory: 768 GB

• Low CPU utilization

• High Utility cost

#CassandraEU16

Flash Offers A New Architectural Choice

Milliseconds 10-3 Microseconds 10-6

Nanoseconds 10-9

CPU Cache DRAM

Disk Drives

Server-based Flash

#CassandraEU17

How can we useflash in Cassandra?

18

Four Deployment Options

1. All Flash

2. Data Placement (CASSANDRA-2749)

3. Use Logical Data Centers

4. Cache Layer

#CassandraEU

19

Cassandra with All-Flash Storage

#CassandraEU

Step 1: Mount ioMemory at /var/lib/cassandraStep 2:

20

Data Placement

▸ https://issues.apache.org/jira/browse/CASSANDRA-2749• Thanks Marcus!

▸Takes advantage of filesystem hierarchy

▸Use mount points to pin Keyspaces or Column Families to flash:• /var/lib/cassandra/data/{Keyspace}/{CF}

▸Use flash for high performance needs, disk for capacity needs

#CassandraEU

https://issues.apache.org/jira/browse/CASSANDRA-2749

https://issues.apache.org/jira/browse/CASSANDRA-2749

21

Data Centers for Storage Control

DC1(Interactive requests)

DC3(High density replicas)

DC2(Hadoop MR Jobs)

PERFORMANCE

CAPACITY/NODE

HIGH

MEDIUM

LOW

HIGH

Cassandra cluster

#CassandraEU

Flash Caching

▸Use Flash to cache blocks from spinning disk• Larger cheaper caches than DRAM• Helps stabilize performance during compaction

▸Open-Source & Commercial options:• Flashcache: FB developed write-through/back/around cache▸ Kernel patch▸ https://github.com/facebook/flashcache/

• bcache: write-through/back/around cache▸ Kernel patch▸ http://bcache.evilpiepirate.org/

• Fusion ioTurbine: write-through, commercially supported

#CassandraEU22

https://github.com/facebook/flashcache/



http://bcache.evilpiepirate.org/

http://bcache.evilpiepirate.org/

23 #CassandraEU

The Numbers

24

YCSB Testing Setup

#CassandraEU

x4x1

YCSB Load Generator

10GB 16-cores24GB DRAM

Workloads use uniformrandom key selectioninstead of Zipfian.

150 million 1KB records, RF=3: ~ 120GB SSTables/node

25

50/50 R/W Uniform distribution 10hrs

#CassandraEU

YC

SB

MIX

ED

OP

S/S

EC

40

68

01

32

01

96

02

60

03

24

03

88

04

52

05

16

05

80

06

44

07

08

07

72

08

36

09

00

09

64

01

02

80

10

92

01

15

60

12

20

01

28

40

13

48

01

41

20

14

76

01

54

00

16

04

01

66

80

17

32

01

79

60

18

60

01

92

40

19

88

02

05

20

21

16

02

18

00

22

44

02

30

80

23

72

02

43

60

25

00

02

56

40

26

28

02

69

20

27

56

02

82

01

28

84

12

94

81

30

12

13

07

61

31

40

13

20

41

32

68

13

33

21

33

96

13

46

01

35

24

13

58

81

0

10000

20000

30000

40000

50000

60000

70000

mixed ops/sec

Update LatencyAverage: 511 µs95th Pctl:1 ms99th Pctl: 2 ms

Read LatencyAverage: 7.0 ms95th Pctl: 18 ms99th Pctl: 42 ms

26

95/5 R/W Uniform distribution

#CassandraEU

MIX

ED

OP

S/S

EC

10

30

50

70

90

11

0

13

0

15

0

17

0

19

0

21

0

23

0

25

0

27

0

29

0

31

0

33

0

35

0

37

0

39

0

41

0

43

0

45

0

47

0

49

0

51

0

53

0

55

0

57

0

59

0

61

0

63

0

65

0

67

0

69

0

0

10000

20000

30000

40000

50000

60000

70000

80000

75 threads 200 threads 300 threads

# threads Avg Lat. 95th pctl 99th pctl

75 1.4/0.22 ms

2/0 ms 5/0 ms

200 3.1/0.19 ms

7/0 ms 13/0 ms

300 4.4/2.2 ms 11/0 ms 19/0 ms

#CassandraEU27

Consolidation

#CassandraEU28

http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html



#CassandraEU29

Real-World Cassandra on Fusion

• 3-4x consolidation factor• 3-6x reduction in latency• 2.2x ROI

#CassandraEU30

x4

x4

Efficiency: Performance or Consolidation?

x4

x4

x4

x4

vs.

Cassandra @ ~100,000 ops/sec (mixed workload)

Memory/DiskioMemory

x4

x4

x4

x4

http://www.fusionio.com/white-papers/accelerate-cassandra-without-the-cluster-crawl/

http://www.fusionio.com/white-papers/accelerate-cassandra-without-the-cluster-crawl/

Thank You

f u s i o n i o . c o m | S A M E P L A N E T. D I F F E R E N T W O R L D .

@mattmorefaster

April 11, 2023

#Cassandra1332

Cassandra: ioDrive2 vs 10 disk RAID-0

#Cassandra1333

50/50 R/W Uniform distribution

April 11, 2023

YC

SB

MIX

ED

OP

S/S

EC

10

20

30

40

50

60

70

80

90

10

01

10

12

01

30

14

01

50

16

01

70

18

01

90

20

02

10

22

02

30

24

02

50

26

02

70

28

02

90

30

03

10

32

03

30

34

03

50

36

03

70

38

03

90

40

04

10

42

04

30

44

04

50

46

04

70

48

04

90

50

05

10

52

05

30

54

05

50

0

20000

40000

60000

80000

100000

120000

mixed ops/sec

Update LatencyAverage: 311 µs95th Pctl:0 ms99th Pctl: 1 ms

Read LatencyAverage: 8.2 ms95th Pctl: 20 ms99th Pctl: 62 ms

34

YCSB: Bulk Load (CL=ALL)

#CassandraEU

YC

SB

IN

SE

RT

S

1 0 1 5 0 2 9 0 4 3 0 5 7 0 7 1 0 8 5 0 9 9 0 1 1 3 0 1 2 7 0 1 4 1 0 1 5 5 0 1 6 9 0 1 8 3 0 1 9 7 0 2 1 1 0 2 2 5 0 2 3 9 0 2 5 3 0 2 6 7 0 2 8 1 00

10000

20000

30000

40000

50000

60000

70000

inserts/sec

Avg Latency: 0.9 ms95th Percentile: 1 ms99th Percentile: 4 ms

Technology

C* Summit EU 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned