Upload
planet-cassandra
View
115
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Learn how to size up an Apache Cassandra cluster by Joe Chu.
Citation preview
How To Size Up A Cassandra ClusterJoe Chu, Technical [email protected]
April 2014
©2014 DataStax Confidential. Do not distribute without consent.
2
What is Apache Cassandra?
• Distributed NoSQL database
• Linearly scalable
• Highly available with no single point of failure
• Fast writes and reads
• Tunable data consistency
• Rack and Datacenter awareness
©2014 DataStax Confidential. Do not distribute without consent.
3
Peer-to-peer architecture
• All nodes are the same
• No master / slave architecture
• Less operational overhead for better scalability.
• Eliminates single point of failure, increasing availability.
©2014 DataStax Confidential. Do not distribute without consent.
Master
Slave
Slave
Peer
Peer
PeerPeer
Peer
4
Linear Scalability
• Operation throughput increases linearly with the number of nodes added.
©2014 DataStax Confidential. Do not distribute without consent.
5
Data Replication
• Cassandra can write copies of data on different nodes.
RF = 3
• Replication factor setting determines the number of copies.
• Replication strategy can replicate data to different racks and and different datacenters.
©2014 DataStax Confidential. Do not distribute without consent.
INSERT INTO user_table (id, first_name, last_name) VALUES (1, ‘John’, ‘Smith’); R1
R2
R3
6
Node
• Instance of a running Cassandra process.
• Usually represented a single machine or server.
©2014 DataStax Confidential. Do not distribute without consent.
7
Rack
• Logical grouping of nodes.
• Allows data to be replicated across different racks.
©2014 DataStax Confidential. Do not distribute without consent.
8
Datacenter
• Grouping of nodes and racks.
• Each data center can have separate replication settings.
• May be in different geographical locations, but not always.
©2014 DataStax Confidential. Do not distribute without consent.
9
Cluster
• Grouping of datacenters, racks, and nodes that communicate with each other and replicate data.
• Clusters are not aware of other clusters.
©2014 DataStax Confidential. Do not distribute without consent.
10
Consistency Models
• Immediate consistency
When a write is successful, subsequent reads are guaranteed to return that latest value.
• Eventual consistency
When a write is successful, stale data may still be read but will eventually return the latest value.
©2014 DataStax Confidential. Do not distribute without consent.
11
Tunable Consistency
• Cassandra offers the ability to chose between immediate and eventual consistency by setting a consistency level.
• Consistency level is set per read or write operation.
• Common consistency levels are ONE, QUORUM, and ALL.
• For multi-datacenters, additional levels such as LOCAL_QUORUM and EACH_QUORUM to control cross-datacenter traffic.
©2014 DataStax Confidential. Do not distribute without consent.
12
CL ONE
• Write: Success when at least one replica node has acknowleged the write.
• Read: Only one replica node is given the read request.
©2014 DataStax Confidential. Do not distribute without consent.
R1
R2
R3CoordinatorClient
RF = 3
13
CL QUORUM
• Write: Success when a majority of the replica nodes has acknowledged the write.
• Read: A majority of the nodes are given the read request.
• Majority = ( RF / 2 ) + 1
©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 13
R1
R2
R3CoordinatorClient
RF = 3
14
CL ALL
• Write: Success when all of the replica nodes has acknowledged the write.
• Read: All replica nodes are given the read request.
©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 14
R1
R2
R3CoordinatorClient
RF = 3
16
Log-Structured Storage Engine
• Cassandra storage engine inspired by Google BigTable
• Key to fast write performance on Cassandra
©2014 DataStax Confidential. Do not distribute without consent.
Memtable
SSTable SSTable SSTable
Commit Log
17
Updates and Deletes
• SSTable files are immutable and cannot be changed.• Updates are written as new data.• Deletes write a tombstone, which mark a row or
column(s) as deleted.• Updates and deletes are just as fast as inserts.
©2014 DataStax Confidential. Do not distribute without consent.
SSTable SSTable SSTable
id:1, first:John, last:Smith
timestamp: …405
id:1, first:John, last:Williams
timestamp: …621
id:1, deleted
timestamp: …999
18
Compaction
• Periodically an operation is triggered that will merge the data in several SSTables into a single SSTable.
• Helps to limits the number of SSTables to read.• Removes old data and tombstones.• SSTables are deleted after compaction
©2014 DataStax Confidential. Do not distribute without consent.
SSTable SSTable SSTable
id:1, first:John, last:Smith
timestamp:405
id:1, first:John, last:Williams
timestamp:621
id:1, deleted
timestamp:999
New SSTable
id:1, deleted
timestamp:999
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
Cluster Sizing
©2014 DataStax Confidential. Do not distribute without consent.
20
Cluster Sizing Considerations
• Replication Factor
• Data Size
“How many nodes would I need to store my data set?”
• Data Velocity (Performance)
“How many nodes would I need to achieve my desired throughput?” ©2014 DataStax Confidential. Do not distribute without consent.
21
Choosing a Replication Factor
©2014 DataStax Confidential. Do not distribute without consent.
22
What Are You Using Replication For?
• Durability or Availability?
• Each node has local durability (Commit Log), but replication can be used for distributed durability.
• For availability, a recommended setting is RF=3.
• RF=3 is the minimum necessary to achieve both consistency and availability using QUORUM and LOCAL_QUORUM.
©2014 DataStax Confidential. Do not distribute without consent.
23
How Replication Can Affect Consistency Level
• When RF < 3, you do not have as much flexibility when choosing consistency and availability.
• QUORUM = ALL
©2014 DataStax Confidential. Do not distribute without consent.
R1
R2
CoordinatorClient
RF = 2
24
Using A Larger Replication Factor
• When RF > 3, there is more data usage and higher latency for operations requiring immediate consistency.
• If using eventual consistency, a consistency level of ONE will have consistent performance regardless of the replication factor.
• High availability clusters may use a replication factor as high as 5.
©2014 DataStax Confidential. Do not distribute without consent.
25
Data Size
©2014 DataStax Confidential. Do not distribute without consent.
26
Disk Usage Factors
• Data Size
• Replication Setting
• Old Data
• Compaction
• Snapshots
©2014 DataStax Confidential. Do not distribute without consent.
27
Data Sizing
• Row and Column Data
• Row and Column Overhead
• Indices and Other Structures
©2014 DataStax Confidential. Do not distribute without consent.
28
Replication Overhead
• A replication factor > 1 will effectively multiply your data size by that amount.
©2014 DataStax Confidential. Do not distribute without consent.
RF = 1 RF = 2 RF = 3
29
Old Data
• Updates and deletes do not actually overwrite or delete data.
• Older versions of data and tombstones remain in the SSTable files until they are compacted.
• This becomes more important for heavy update and delete workloads.
©2014 DataStax Confidential. Do not distribute without consent.
30
Compaction
• Compaction needs free disk space to write the new SSTable, before the SSTables being compacted are removed.
• Leave enough free disk space on each node to allow compactions to run.
• Worst case for the Size Tier Compaction Strategy is
50% of the total data capacity of the node.
• For the Leveled Compaction Strategy, that is about 10% of the total data capacity.
©2014 DataStax Confidential. Do not distribute without consent.
31
Snapshots
• Snapshots are hard-links or copies of SSTable data files.
• After SSTables are compacted, the disk space may not be reclaimed if a snapshot of those SSTables were created.
Snapshots are created when:
• Executing the nodetool snapshot command• Dropping a keyspace or table• Incremental backups• During compaction
©2014 DataStax Confidential. Do not distribute without consent.
32
Recommended Disk Capacity
• For current Cassandra versions, the ideal disk capacity is approximate 1TB per node if using spinning disks and 3-5 TB per node using SSDs.
• Having a larger disk capacity may be limited by the resulting performance.
• What works for you is still dependent on your data model design and desired data velocity.
©2014 DataStax Confidential. Do not distribute without consent.
33
Data Velocity (Performance)
©2014 DataStax Confidential. Do not distribute without consent.
34
How to Measure Performance
• I/O Throughput
“How many reads and writes can be completed per second?”
• Read and Write Latency
“How fast should I be able to get a response for my read and write requests?”
©2014 DataStax Confidential. Do not distribute without consent.
35
Sizing for Failure
• Cluster must be sized taking into account the performance impact caused by failure.
• When a node fails, the corresponding workload must be absorbed by the other replica nodes in the cluster.
• Performance is further impacted when recovering a node. Data must be streamed or repaired using the other replica nodes.
©2014 DataStax Confidential. Do not distribute without consent.
36
Hardware Considerations for Performance
CPU• Operations are often CPU-intensive.• More cores are better.
Memory• Cassandra uses JVM heap memory.• Additional memory used as off-heap memory by
Cassandra, or as the OS page cache.
Disk• C* optimized for spinning disks, but SSDs will perform
better.• Attached storage (SAN) is strongly discouraged.
©2014 DataStax Confidential. Do not distribute without consent.
37
Some Final Words…
©2014 DataStax Confidential. Do not distribute without consent.
38
Summary
• Cassandra allows flexibility when sizing your cluster from a single node to thousands of nodes
• Your use case will dictate how you want to size and configure your Cassandra cluster. Do you need availability? Immediate consistency?
• The minimum number of nodes needed will be determined by your data size, desired performance and replication factor.
©2014 DataStax Confidential. Do not distribute without consent.
39
Additional Resources
• DataStax Documentationhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAbout_c.html
• Planet Cassandrahttp://planetcassandra.org/nosql-cassandra-education/
• Cassandra Users Mailing [email protected]://mail-archives.apache.org/mod_mbox/cassandra-user/
©2014 DataStax Confidential. Do not distribute without consent.
40
Questions?
Questions?
©2014 DataStax Confidential. Do not distribute without consent.
Thank You
We power the big dataapps that transform business.
41©2014 DataStax Confidential. Do not distribute without consent.