· Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not...

www.informatik-aktuell.de

Patrick Guillebert Solutions Architect

pguillebert@datastax.com

Availability, scalability and consistency

with Cassandra

Cassandra and the CAP Theorem

In a distributed system, only 2 of these attributes can be fulfilled at a time: Consistency, Availability, Partition tolerance

Availability, Partition tolerance is deeply built in by the design.

Consistency results programmatically and is tunable.

Cassandra is an AP system

Cassandra’s design goals

Massive and predictable scaling

Always-On

Tunable consistency

Geographically distributed

Low latency

Operationally simple

2016: + User friendly

Numbers and facts

from production deployments

Scale at Apple

Cassandra can scale from 3 to 1000+ nodes

• Footprint @ Apple

• 75,000+ nodes

• 10’s of petabytes of data

• Millions ops/second

• Largest cluster 1000+ nodes

Apple Inc.: Cassandra at Apple for Massive Scale

Video https://www.youtube.com/watch?v=Bc4ql9TDzyg Form Cassandra Summit, USA, September 2014

Availability at Netflix

Consistency at ING Bank

https://www.youtube.com/watch?v=-sD3x8-tuDU

Scalability

Linear and predictable scaling

Need to store mode data? Add more nodes.

Need more throughput ? Add more nodes.

Cassandra

• Is “future proof”

• By scaling out linearly on commodity hardware.

Partitionning

• The dataset is distributed all nodes of the cluster

• Data subsets are called “token ranges”

• With vnodes, a node could have several small token ranges, 256 by default

Token Range

(Murmur3)

Node 1

Node 3

Node 2 Node 4

- 263 + 263

The partitioner

"@DataSta

Hash Function

-2245462676723223822

7723358927203680754

• Partitioner is responsible for calculating the token, which is the hash of the key of the data to store

• The token is just a number between -2^63 and 2^63

• Partitioners:

Murmur3Partitioner (Murmur3), RandomPartitioner (MD5)

ByteOrderedPartitioner

Data distribution within the cluster

Node 1

Node 3

Node 2 Node 4 Node 4

Partitioner

’@Datastax'

Token 91

Node 1

Availability

Node level availability

A node failure has no impact

• No failover

• 0 downtime

• 0 data loss

• Consistent responses

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

Parallel Writes

RF=3 CL=QUORUM

If 2 nodes out of the 3 replicas respond, the request is a success

Immediate onsistency is ensured

Node 4

Rack level availability

A rack failure has no impact on the service

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

CL=QUORUM

Immediate consistency (RF 3 / CL QUORUM)

Nodes are distributed across at least 3 racks

If a rack fails, a quorum of replicas remain

available in the remaining racks and the

request suceeds.

Node 4

RF=3 CL=QUORUM

Data Centre level availability

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

DC: Frankfurt

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

DC: Dublin

A DC failure has no impact on the service

Internals and practices that enable availability

• Cluster topology awareness (“NetworkTopologyStrategy”) in replication

A cluster is described as a group of data centres, containining racks, containining nodes.

This topology is used to set replicas in a “smart” fashion.

• Peer to peer architecture

All nodes are strictly equal on the read/write path, any node can be queried for any data

Seeds have a special function only in the context of node bootstrap

Failover never happens

• Trade off on consistency

Response consistency is ensured by asking a quorum of nodes to be consistent.

There is no single “master” knowing the latest write.

• Live operations

Whichever the setup is (single or multi-dc) operations can be made live, with no downtime

Single DC replication

Node 1

Node 3

Node 2 Node 4

Client Driver

Node 4

Node 1

Node 2

Node 3

26-50 51-75

Single Data Center

CREATE KEYSPACE demo

WITH REPLICATION = {

'class':'SimpleStrategy',

'replication_factor':3

Primary Range

for Token 91

Multi-DC replication

Node 1

Node 3

Node 2 Node 4

Client Driver

Node 4

Node 1

Node 2

Node 3

Node 1

Node 3

Node 2 Node 4

Data Center - West Rack 1

Rack 2

Node 2

Rack 1

Rack 2

CREATE KEYSPACE demo

WITH REPLICATION {

'class':'NetworkTopologyStrategy',

'dc-east':2,

'dc-west’:3

Remote

Coordinator

Primary Range

for Token 91

89-100

What is different from a master-slave system ?

Failovers never happens, having no master irremediable data corruption in split brain situation can’t happen

Consistency

Consistency types

• Immediate consistency

The uniquely given consistency type which a relational database provide.

A read always return the last written data, whichever node is requested.

• Eventual consistency

A read will eventually return the last written data.

Depending on the node queried, responses could be different.

Cassandra offers both types of consistency, it can be tuned.

Consistency Levels

Consistency levels are set on each single request. It determines the number of node which need to acknowledge a write or agree on the value of a queried data.

Consistency types could be different depending on the table or in time.

• Common CLs: ONE, QUORUM, ALL

• Multi-DC CLs: LOCAL_ONE, LOCAL_QUORUM, LOCAL_ALL, EACH_QUORUM

• Special CLs: ANY, SERIAL, LOCAL_SERIAL

General consistency levels

QUORUM

SERIAL

Multi-DC consistency levels

LOCAL_ONE

LOCAL_QUORUM

LOCAL_ALL

EACH_QUORUM

LOCAL_SERIAL

The rule to tune consistency

Immediate consistency :

CL READ + CL WRITE > REPLICATION

READ QUORUM + WRITE QUORUM

READ ONE + WRITE ALL

READ ALL + WRITE ONE

Eventual consistency

CL READ + CL WRITE <= REPLICATION

READ ONE + WRITE ONE

READ ONE + WRITE ANY

Time base conflict resolution

Node 1

Node 3

Node 2 Node 4

Client Driver

timestamp

… 523

timestamp

… 511

timestamp

… 542

The last write wins !

Empirical verification of the rule

Node 1

Node 3

Node 2 Node 4 RF = 3

CL QUORUM WRITE

ALL COMBINATIONS

OF CL QUORUM

Use cases

Domains of use cases

• IoT

• Web / mobile

• Gaming

• Bank, finance

• Telecoms

Using C* for to back a global reference data service

Requirements

• Always on

• High throughput

• Low latency

• Multi-region replication

Little or big data

Extending use cases with DSE

Thank you !

· Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not...

Documents

Data Pipelines with Spark & DataStax Enterprise

DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise

DataStax: The Whys of NoSQL

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shakirzyanov, DataStax) | C* Summit 2016

DataStax | DSE Production-Certified Cassandra on Pivotal Cloud Foundry (Ben Lackey, DataStax / Damian O'connor, Pivotal) | C* Summit 2016

DataStax: Dockerizing Cassandra on Modern Linux

Introduction to Cassandra and datastax DSE

SQL Support in DataStax Enterprise€¦ · SQL Support in DataStax Enterprise INTRODUCTION This paper describes the Structured Query Language (SQL) support in DataStax Enterprise

DataStax CS Netflix new - CIO Summits · PDF fileDataStax powers the big data apps that transform business for more than 250 customers, ... The next step in Netflix’s plan to build

WP DataStax Enterprise Best Practices

DataStax Enterprise BBL

DataStax Enterprise on Microsoft Azure

DataStax: Datastax Enterprise - The Multi-Model Platform

Scaling DataStax in Docker

DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Mainstay DATASTAX

DataStax: An Introduction to DataStax Enterprise Search

1 Analysis of Netflix presented by Vince Wang. 2 Agenda Introduction Introduction What is Netflix? What is Netflix? How Netflix Works? How Netflix Works?

DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016

DataStaxODBCdriverforApache ......[ODBC Drivers] DataStax ODBC driver for Apache Cassandra and DataStax Enterprise with CQL connector 32-bit=Installed DataStax ODBC driver for Apache