View
2
Download
0
Category
Preview:
Citation preview
www.informatik-aktuell.de
Patrick Guillebert Solutions Architect
pguillebert@datastax.com
Availability, scalability and consistency
with Cassandra
Cassandra and the CAP Theorem
In a distributed system, only 2 of these attributes can be fulfilled at a time: Consistency, Availability, Partition tolerance
Availability, Partition tolerance is deeply built in by the design.
Consistency results programmatically and is tunable.
©2014 DataStax Confidential. Do not distribute without consent. 2
Cassandra is an AP system
Cassandra’s design goals
Massive and predictable scaling
Always-On
Tunable consistency
Geographically distributed
Low latency
Operationally simple
2016: + User friendly
©2015 DataStax. Do not distribute without consent. 3
© 2014 DataStax Confidential. Do not distribute without consent.
4
Numbers and facts
from production deployments
Scale at Apple
Cassandra can scale from 3 to 1000+ nodes
• Footprint @ Apple
• 75,000+ nodes
• 10’s of petabytes of data
• Millions ops/second
• Largest cluster 1000+ nodes
Apple Inc.: Cassandra at Apple for Massive Scale
Video https://www.youtube.com/watch?v=Bc4ql9TDzyg Form Cassandra Summit, USA, September 2014
Availability at Netflix
©2015 DataStax. Do not distribute without consent. 6
Consistency at ING Bank
©2015 DataStax. Do not distribute without consent. 7
https://www.youtube.com/watch?v=-sD3x8-tuDU
8
Scalability
Linear and predictable scaling
Need to store mode data? Add more nodes.
Need more throughput ? Add more nodes.
©2015 DataStax. Do not distribute without consent. 9
Cassandra
• Is “future proof”
• By scaling out linearly on commodity hardware.
Partitionning
• The dataset is distributed all nodes of the cluster
• Data subsets are called “token ranges”
• With vnodes, a node could have several small token ranges, 256 by default
©2015 DataStax. Do not distribute without consent. 10
Token Range
(Murmur3)
Node 1
Node 3
Node 2 Node 4
- 263 + 263
The partitioner
©2015 DataStax. Do not distribute without consent. 11
"@DataSta
x"
Hash Function
97203
-2245462676723223822
7723358927203680754
• Partitioner is responsible for calculating the token, which is the hash of the key of the data to store
• The token is just a number between -2^63 and 2^63
• Partitioners:
Murmur3Partitioner (Murmur3), RandomPartitioner (MD5)
ByteOrderedPartitioner
Data distribution within the cluster
©2015 DataStax. Do not distribute without consent. 12
Node 1
Node 3
Node 2 Node 4 Node 4
Partitioner
’@Datastax'
Token 91
Node 1
0 100
13
Availability
Node level availability
A node failure has no impact
• No failover
• 0 downtime
• 0 data loss
• Consistent responses
©2014 DataStax Confidential. Do not distribute without consent. 14
Node 1
1st copy
Node 4
Node 5 Node 2
2nd copy
Node 3
3rd copy
Parallel Writes
RF=3 CL=QUORUM
If 2 nodes out of the 3 replicas respond, the request is a success
Immediate onsistency is ensured
Node 4
Rack level availability
A rack failure has no impact on the service
©2014 DataStax Confidential. Do not distribute without consent. 15
Node 1
1st copy
Node 4
Node 5 Node 2
2nd copy
Node 3
3rd copy
Write
CL=QUORUM
Immediate consistency (RF 3 / CL QUORUM)
:
Nodes are distributed across at least 3 racks
If a rack fails, a quorum of replicas remain
available in the remaining racks and the
request suceeds.
Node 4
RAC 1
RAC 2
RAC 3
RF=3 CL=QUORUM
Data Centre level availability
©2014 DataStax Confidential. Do not distribute without consent. 16
Node 1
1st copy
Node 4
Node 5 Node 2
2nd copy
Node 3
3rd copy
DC: Frankfurt
Node 1
1st copy
Node 4
Node 5 Node 2
2nd copy
Node 3
3rd copy
DC: Dublin
A DC failure has no impact on the service
Internals and practices that enable availability
©2014 DataStax Confidential. Do not distribute without consent.
• Cluster topology awareness (“NetworkTopologyStrategy”) in replication
A cluster is described as a group of data centres, containining racks, containining nodes.
This topology is used to set replicas in a “smart” fashion.
• Peer to peer architecture
All nodes are strictly equal on the read/write path, any node can be queried for any data
Seeds have a special function only in the context of node bootstrap
Failover never happens
• Trade off on consistency
Response consistency is ensured by asking a quorum of nodes to be consistent.
There is no single “master” knowing the latest write.
• Live operations
Whichever the setup is (single or multi-dc) operations can be made live, with no downtime
Single DC replication
©2014 DataStax Confidential. Do not distribute without consent.
Node 1
Node 3
Node 2 Node 4
Client Driver
Node 4
Node 1
Node 2
Node 3
1-25
26-50 51-75
76-0
91
91
Single Data Center
CREATE KEYSPACE demo
WITH REPLICATION = {
'class':'SimpleStrategy',
'replication_factor':3
};
Primary Range
for Token 91
Multi-DC replication
©2014 DataStax Confidential. Do not distribute without consent.
Node 1
Node 3
Node 2 Node 4
Client Driver
Node 4
Node 1
Node 2
Node 3
1-13
26-38
51-63
76-88
Node 1
Node 3
Node 2 Node 4
Data Center - West Rack 1
Rack 2
Node 2
Rack 1
Rack 2
91
91
91
CREATE KEYSPACE demo
WITH REPLICATION {
'class':'NetworkTopologyStrategy',
'dc-east':2,
'dc-west’:3
}
Remote
Coordinator
Primary Range
for Token 91
14-25
39-50
64-75
89-100
91
What is different from a master-slave system ?
©2014 DataStax Confidential. Do not distribute without consent.
Failovers never happens, having no master irremediable data corruption in split brain situation can’t happen
21
Consistency
Consistency types
©2015 DataStax. Do not distribute without consent. 22
• Immediate consistency
The uniquely given consistency type which a relational database provide.
A read always return the last written data, whichever node is requested.
• Eventual consistency
A read will eventually return the last written data.
Depending on the node queried, responses could be different.
Cassandra offers both types of consistency, it can be tuned.
Consistency Levels
©2015 DataStax. Do not distribute without consent. 23
Consistency levels are set on each single request. It determines the number of node which need to acknowledge a write or agree on the value of a queried data.
Consistency types could be different depending on the table or in time.
• Common CLs: ONE, QUORUM, ALL
• Multi-DC CLs: LOCAL_ONE, LOCAL_QUORUM, LOCAL_ALL, EACH_QUORUM
• Special CLs: ANY, SERIAL, LOCAL_SERIAL
General consistency levels
©2015 DataStax. Do not distribute without consent. 24
ONE
QUORUM
ALL
ANY
SERIAL
Multi-DC consistency levels
©2015 DataStax. Do not distribute without consent. 25
LOCAL_ONE
LOCAL_QUORUM
LOCAL_ALL
EACH_QUORUM
LOCAL_SERIAL
The rule to tune consistency
©2015 DataStax. Do not distribute without consent. 26
Immediate consistency :
CL READ + CL WRITE > REPLICATION
READ QUORUM + WRITE QUORUM
READ ONE + WRITE ALL
READ ALL + WRITE ONE
Eventual consistency
CL READ + CL WRITE <= REPLICATION
READ ONE + WRITE ONE
READ ONE + WRITE ANY
Time base conflict resolution
©2015 DataStax. Do not distribute without consent. 27
Node 1
Node 3
Node 2 Node 4
Client Driver
timestamp
… 523
timestamp
… 511
timestamp
… 542
The last write wins !
Empirical verification of the rule
©2015 DataStax. Do not distribute without consent. 28
Node 1
Node 3
Node 2 Node 4 RF = 3
CL QUORUM WRITE
ALL COMBINATIONS
OF CL QUORUM
29
Use cases
Domains of use cases
©2015 DataStax. Do not distribute without consent. 30
• IoT
• Web / mobile
• Gaming
• Bank, finance
• Telecoms
Using C* for to back a global reference data service
©2015 DataStax. Do not distribute without consent. 31
Requirements
• Always on
• High throughput
• Low latency
• Multi-region replication
Little or big data
Extending use cases with DSE
©2015 DataStax. Do not distribute without consent. 32
Thank you !
Recommended