How to size up an Apache Cassandra cluster (Training)

How To Size Up A Cassandra ClusterJoe Chu, Technical [email protected]

April 2014

©2014 DataStax Confidential. Do not distribute without consent.

2

What is Apache Cassandra?

• Distributed NoSQL database

• Linearly scalable

• Highly available with no single point of failure

• Fast writes and reads

• Tunable data consistency

• Rack and Datacenter awareness


3

Peer-to-peer architecture

• All nodes are the same

• No master / slave architecture

• Less operational overhead for better scalability.

• Eliminates single point of failure, increasing availability.


Master

Slave

Slave

Peer

Peer

PeerPeer

Peer

4

Linear Scalability

• Operation throughput increases linearly with the number of nodes added.


5

Data Replication

• Cassandra can write copies of data on different nodes.

RF = 3

• Replication factor setting determines the number of copies.

• Replication strategy can replicate data to different racks and and different datacenters.


INSERT INTO user_table (id, first_name, last_name) VALUES (1, ‘John’, ‘Smith’); R1

R2

R3

6

Node

• Instance of a running Cassandra process.

• Usually represented a single machine or server.


7

Rack

• Logical grouping of nodes.

• Allows data to be replicated across different racks.


8

Datacenter

• Grouping of nodes and racks.

• Each data center can have separate replication settings.

• May be in different geographical locations, but not always.


9

Cluster

• Grouping of datacenters, racks, and nodes that communicate with each other and replicate data.

• Clusters are not aware of other clusters.


10

Consistency Models

• Immediate consistency

When a write is successful, subsequent reads are guaranteed to return that latest value.

• Eventual consistency

When a write is successful, stale data may still be read but will eventually return the latest value.


11

Tunable Consistency

• Cassandra offers the ability to chose between immediate and eventual consistency by setting a consistency level.

• Consistency level is set per read or write operation.

• Common consistency levels are ONE, QUORUM, and ALL.

• For multi-datacenters, additional levels such as LOCAL_QUORUM and EACH_QUORUM to control cross-datacenter traffic.


12

CL ONE

• Write: Success when at least one replica node has acknowleged the write.

• Read: Only one replica node is given the read request.


R1

R2

R3CoordinatorClient

RF = 3

13

CL QUORUM

• Write: Success when a majority of the replica nodes has acknowledged the write.

• Read: A majority of the nodes are given the read request.

• Majority = ( RF / 2 ) + 1

©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 13

R1

R2

R3CoordinatorClient

RF = 3

14

CL ALL

• Write: Success when all of the replica nodes has acknowledged the write.

• Read: All replica nodes are given the read request.

©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 14

R1

R2

R3CoordinatorClient

RF = 3

16

Log-Structured Storage Engine

• Cassandra storage engine inspired by Google BigTable

• Key to fast write performance on Cassandra


Memtable

SSTable SSTable SSTable

Commit Log

17

Updates and Deletes

• SSTable files are immutable and cannot be changed.• Updates are written as new data.• Deletes write a tombstone, which mark a row or

column(s) as deleted.• Updates and deletes are just as fast as inserts.



id:1, first:John, last:Smith

timestamp: …405

id:1, first:John, last:Williams

timestamp: …621

id:1, deleted

timestamp: …999

18

Compaction

• Periodically an operation is triggered that will merge the data in several SSTables into a single SSTable.

• Helps to limits the number of SSTables to read.• Removes old data and tombstones.• SSTables are deleted after compaction



id:1, first:John, last:Smith

timestamp:405

id:1, first:John, last:Williams

timestamp:621

id:1, deleted

timestamp:999

New SSTable

id:1, deleted

timestamp:999

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

Cluster Sizing


20

Cluster Sizing Considerations

• Replication Factor

• Data Size

“How many nodes would I need to store my data set?”

• Data Velocity (Performance)

“How many nodes would I need to achieve my desired throughput?” ©2014 DataStax Confidential. Do not distribute without consent.

21

Choosing a Replication Factor


22

What Are You Using Replication For?

• Durability or Availability?

• Each node has local durability (Commit Log), but replication can be used for distributed durability.

• For availability, a recommended setting is RF=3.

• RF=3 is the minimum necessary to achieve both consistency and availability using QUORUM and LOCAL_QUORUM.


23

How Replication Can Affect Consistency Level

• When RF < 3, you do not have as much flexibility when choosing consistency and availability.

• QUORUM = ALL


R1

R2

CoordinatorClient

RF = 2

24

Using A Larger Replication Factor

• When RF > 3, there is more data usage and higher latency for operations requiring immediate consistency.

• If using eventual consistency, a consistency level of ONE will have consistent performance regardless of the replication factor.

• High availability clusters may use a replication factor as high as 5.


25

Data Size


26

Disk Usage Factors

• Data Size

• Replication Setting

• Old Data

• Compaction

• Snapshots


27

Data Sizing

• Row and Column Data

• Row and Column Overhead

• Indices and Other Structures


28

Replication Overhead

• A replication factor > 1 will effectively multiply your data size by that amount.


RF = 1 RF = 2 RF = 3

29

Old Data

• Updates and deletes do not actually overwrite or delete data.

• Older versions of data and tombstones remain in the SSTable files until they are compacted.

• This becomes more important for heavy update and delete workloads.


30

Compaction

• Compaction needs free disk space to write the new SSTable, before the SSTables being compacted are removed.

• Leave enough free disk space on each node to allow compactions to run.

• Worst case for the Size Tier Compaction Strategy is

50% of the total data capacity of the node.

• For the Leveled Compaction Strategy, that is about 10% of the total data capacity.


31

Snapshots

• Snapshots are hard-links or copies of SSTable data files.

• After SSTables are compacted, the disk space may not be reclaimed if a snapshot of those SSTables were created.

Snapshots are created when:

• Executing the nodetool snapshot command• Dropping a keyspace or table• Incremental backups• During compaction


32

Recommended Disk Capacity

• For current Cassandra versions, the ideal disk capacity is approximate 1TB per node if using spinning disks and 3-5 TB per node using SSDs.

• Having a larger disk capacity may be limited by the resulting performance.

• What works for you is still dependent on your data model design and desired data velocity.


33

Data Velocity (Performance)


34

How to Measure Performance

• I/O Throughput

“How many reads and writes can be completed per second?”

• Read and Write Latency

“How fast should I be able to get a response for my read and write requests?”


35

Sizing for Failure

• Cluster must be sized taking into account the performance impact caused by failure.

• When a node fails, the corresponding workload must be absorbed by the other replica nodes in the cluster.

• Performance is further impacted when recovering a node. Data must be streamed or repaired using the other replica nodes.


36

Hardware Considerations for Performance

CPU• Operations are often CPU-intensive.• More cores are better.

Memory• Cassandra uses JVM heap memory.• Additional memory used as off-heap memory by

Cassandra, or as the OS page cache.

Disk• C* optimized for spinning disks, but SSDs will perform

better.• Attached storage (SAN) is strongly discouraged.


37

Some Final Words…


38

Summary

• Cassandra allows flexibility when sizing your cluster from a single node to thousands of nodes

• Your use case will dictate how you want to size and configure your Cassandra cluster. Do you need availability? Immediate consistency?

• The minimum number of nodes needed will be determined by your data size, desired performance and replication factor.


39

Additional Resources

• DataStax Documentationhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAbout_c.html

• Planet Cassandrahttp://planetcassandra.org/nosql-cassandra-education/

• Cassandra Users Mailing [email protected]://mail-archives.apache.org/mod_mbox/cassandra-user/


40

Questions?

Questions?


Thank You

We power the big dataapps that transform business.

41©2014 DataStax Confidential. Do not distribute without consent.

Technology

How to size up an Apache Cassandra cluster (Training)