Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
HDB++: HIGH
AVAILABILITY WITH
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Page 1
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 2 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 3 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
WHAT IS CASSANDRA?
• Mythology: an excellent
Oracle not believed.
• A massively scalable open
source NoSQL (Not Only
SQL) database
• Created by Facebook
• Open Source since 2008
• Apache license, 2.0,
compatible with GPLV3
Page 4 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
WHAT IS CASSANDRA?
• Peer to peer architecture
• No Single Point of Failure
• Replication
• Continuous Availability
• Multi Data Centers support
• 100s to 1000s nodes
• Java
• High Write Throughput
• Read efficiency
Page 5 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
WHAT IS CASSANDRA?
Page 6 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Source: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
WHO IS USING CASSANDRA?
Page 8 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 9 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
• CQL: Cassandra Query Language
• Very similar to SQL
• But restrictions and limitations • JOIN requests are forbidden • No subqueries • String comparisons are limited (when not using SOLR)
select * from my_table where mystring like ‘%tango%’;
• No OR operator • Can only apply a WHERE condition on an indexed column
(or primary key)
Page 10 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
• Collections (64K Limitation): • list • set • map
• TTL
• INSERT = UPDATE (UPSERT)
• Doc: http://www.datastax.com/documentation/cql/3.1/cql/cql_intro_c.html
• cqlsh
Page 11 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro (
att_conf_id timeuuid,
period text,
data_time timestamp,
data_time_us int,
value_r double,
quality int,
error_desc text,
PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us)
)
WITH comment='Scalar DevDouble ReadOnly Values Table‘;
Page 12 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA QUERY LANGUAGE
CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro (
att_conf_id timeuuid,
period text,
data_time timestamp,
data_time_us int,
value_r double,
quality int,
error_desc text,
PRIMARY KEY ((att_conf_id ,period),data_time,data_time_us)
);
Page 13 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Partition key Clustering columns
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 14 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CASSANDRA ARCHITECTURE
• Node: one Cassandra instance (Java process)
Page 15 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1 Node 2
Node 5 Node 6
Node 3 Node 4
Node 7 Node 8
Token Range +263-1 -263
CASSANDRA ARCHITECTURE
• Partition: ordered and replicable unit of data on a
node identified by a token
• Partitioner (based on mumur3 algorithm by default)
will distribute the data across the nodes.
Page 16 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1 Node 2
Node 5 Node 6
Node 3 Node 4
Node 7 Node 8
Token Range +263-1 -263
CASSANDRA ARCHITECTURE
Page 17 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Rack: logical set of nodes
Rack 1
Rack 2 Rack 4
Rack 3 Node 1
Node 5 Node 7
Node 3 Node 2
Node 6
Node 4
Node 8
Token Range -263 +263-1
CASSANDRA ARCHITECTURE
Page 18 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Data Center: logical set of racks
Rack 1
Rack 2 Rack 4
Rack 3 Node 1
Node 5 Node 7
Node 3 Node 2
Node 6
Node 4
Node 7
Data Center 1 Data Center 2 Token Range +263-1 -263
REQUEST COORDINATION
Page 19 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Cluster: full set of nodes which maps to a single
complete token ring
Rack 1
Rack 2 Rack 4
Rack 3 Node 1
Node 5 Node 7
Node 3 Node 2
Node 6
Node 4
Node 7
Data Center 1 Data Center 2
Cassandra Cluster
Token Range +263-1 -263
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 20 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
REQUEST COORDINATION
Page 21 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1 Node 1
Node 2 Node 4
Node 3
Client
REQUEST COORDINATION
Page 22 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1 Node 1
Node 2 Node 4
Node 3
Client
Coordinator
REQUEST COORDINATION
Page 23 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Coordinator: the node chosen by the client to
receive a particular read or write request to its
cluster
Data Center 1 Node 1
Node 2 Node 4
Node 3
Client
Read/Write
Coordinator
REQUEST COORDINATION
Page 24 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Any node can coordinate any request
• Each client request may be coordinated by a
different node
Data Center 1 Node 1
Node 2 Node 4
Node 3
Client
Acknowledge
Coordinator
No Single Point of
Failure
REQUEST COORDINATION
Page 25 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• The Cassandra driver chooses the coordinator node
• Round-Robin pattern, token-aware pattern
• Client library to manage requests
• Many open source drivers for many programming languages
Node 1
Node 2 Node 4
Node 3
Client Coordinator Driver
Java Python C++
C# Node.js
PHP
Perl
Go Clojure
Haskell
R (GNU S) Ruby Scala
Erlang ODBC
Rust
REQUEST COORDINATION
Page 26 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• The coordinator manages the
replication process
• Replication Factor (RF): onto
how many nodes should a
write be copied
• The write will occur on the
nodes responsible for that
partition
• 1 ≤ RF ≤ (#nodes in cluster)
• Every write is time-stamped
Node 1
Node 2 Node 4
Node 3
Coordinator
Client Driver
RF=3
REQUEST COORDINATION
Page 27 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2 Node 4
Node 3
Coordinator
Client Driver
RF=3
• The coordinator manages the
replication process
• Replication Factor (RF): onto
how many nodes should a
write be copied
• The write will occur on the
nodes responsible for that
partition
• 1 ≤ RF ≤ (#nodes in cluster)
• Every write is time-stamped
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 28 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
CONSISTENCY
Page 29 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
• The coordinator applies the Consistency Level (CL)
• Consistency Level (CL): Number of nodes which must
acknowledge a request
• Examples of CL: • ONE • TWO • THREE • ANY • ALL (Not recommended) • QUORUM (= RF/2 + 1) • EACH_QUORUM • LOCAL_QUORUM
• CL may vary for each request
• On success, the coordinator notifies the client (with most
recent partition data in case of read request)
CONSISTENCY ONE - READ - SINGLE DC
Page 30 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
CONSISTENCY ONE - READ - SINGLE DC
Page 31 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
CONSISTENCY ONE – READ - SINGLE DC
Page 32 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
CONSISTENCY ONE - READ - SINGLE DC
Page 33 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash) +
eventual read repair
CONSISTENCY QUORUM – READ - SINGLE DC
Page 34 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash)
CONSISTENCY QUORUM – READ - SINGLE DC
Page 35 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash)
CONSISTENCY QUORUM – READ - SINGLE DC
Page 36 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash)
CONSISTENCY QUORUM – READ - SINGLE DC
Page 37 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash)
In case of inconsistency: the
most recent data is returned
CONSISTENCY QUORUM – READ - SINGLE DC
Page 38 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Direct Read Request
Digest Read Request (Hash)
Read repair if needed
CONSISTENCY ONE – WRITE - SINGLE DC
Page 39 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
CONSISTENCY ONE – WRITE - SINGLE DC
Page 40 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
CONSISTENCY ONE – WRITE - SINGLE DC
Page 41 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
CONSISTENCY ONE – WRITE - SINGLE DC
Page 42 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
SUCCESS
CONSISTENCY ONE – WRITE - SINGLE DC
Page 43 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
hint max_hint_window_in_ms
property in
cassandra.yaml file
Hinted handoff mechanism
SUCCESS
CONSISTENCY ONE – WRITE - SINGLE DC
Page 44 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
hint max_hint_window_in_ms
property in
cassandra.yaml file
Hinted handoff mechanism
CONSISTENCY ONE – WRITE - SINGLE DC
Page 45 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
hint max_hint_window_in_ms
property in
cassandra.yaml file
Hinted handoff mechanism
CONSISTENCY ONE – WRITE - SINGLE DC
Page 46 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Hinted handoff mechanism
CONSISTENCY
Page 47 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 4
if node downtime > max_hint_window_in_ms
Anti-entropy node repair
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 48 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 49 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 50 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
SUCCESS
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 51 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 52 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
SUCCESS
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 53 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
Write Request
CONSISTENCY QUORUM – WRITE - SINGLE DC
Page 54 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Node 1
Node 2
Node 4
Node 3
Coordinator
Client Driver
RF=3
Node 5
Node 6
ACK
ACK
FAILURE
CONSISTENCY LOCAL QUORUM – WRITE - MULTI DC
Page 55 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Client Driver
Node 5
Write Request
Node 1
Node 2
Node 4
Node 3
RF=3
Node 6
Node 1
Node 2
Node 4
Node 3
RF=2
Node 5
DC1 DC2
Coordinator
CONSISTENCY LOCAL QUORUM – WRITE - MULTI DC
Page 56 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Coordinator
Client Driver
Node 5
Node 1
Node 2
Node 4
Node 3
RF=3 Node 6
Node 1
Node 2
Node 4
Node 3
RF=2
Node 5
DC1 DC2
ACK
ACK
SUCCESS
ACK
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 57 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
MONITORING TOOL: OPSCENTER
Page 58 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
http://cassandra2:8888
OVERVIEW
• What is Cassandra (C*)?
• Who is using C*?
• CQL
• C* architecture
• Request Coordination
• Consistency
• Monitoring tool
• HDB++
Page 59 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++
Page 60 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
hdb++es-srv hdb++cm-srv
libhdb++
libhdb++cassandra
<<implements>>
libhdb++mysql
<<implements>>
hdb++es-srv hdb++es-srv
hdb++es-srv
hdb++es-srv hdb++cm-srv hdb++es-srv
hdb++es-srv hdb++es-srv
<<use>> <<use>>
MySQL Cassandra Cassandra Cassandra
CONCLUSION: C* PROS
Page 61 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• High Availability
• SW upgrade with no downtime
• HW failure
• Linear Scalability
• Need more performances? => Add nodes
• Big community with industrial support
CONCLUSION: C* PROS
Page 62 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Can use Apache Spark for analytics (distributed
processing)
• List, Set, Map data types (tuples and user defined
types soon)
• Tries not to let you do actions which do not perform
well
• Backups = snapshot = hard links => very fast
(+Replication)
• Difficult to lose data
• Good fit for time series data
CONCLUSION: C* CONS
Page 63 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Requires more total disk space and machines
• sstable (C* data files) format can change from one
version to another
• No easy way to come back to a previous version
once the sstables have been converted to a newer
version
• Cannot rename keyspaces or tables easily (not
foreseen in CQL)
• Tedious to modify existing partitions (Needs to
duplicate the data at some point in the process)
CONCLUSION: C* CONS
Page 64 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• Different way of modeling
• Not designed for huge read requests
• Can be tricky to tune to avoid long GC pauses
• Maintenance: Need to run nodetool repair regularly
if some data are deleted to avoid resurrections
(CPU intensive operation)
• Can take quite some time to redeem disk space
after deletion in some specific cases.
THE END
USEFUL LINKS
Page 66 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
• http://cassandra.apache.org
• Planet Cassandra (http://planetcassandra.org)
• Datastax academy (https://academy.datastax.com)
• Cassandra Java Driver getting started
(https://academy.datastax.com/demos/cassandra-java-driver-getting-
started)
• Cassandra C++ Driver: https://github.com/datastax/cpp-driver
• Datastax documentation (http://www.datastax.com/docs)
• Users mailing list: [email protected]
• #Cassandra channel on IRC
(http://webchat.freenode.net/?channels=#Cassandra)
CASSANDRA FUTURE DEPLOYMENT
Page 67 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
DC Prod 1 partition/hour
Keyspace prod RF:3
(write LOCAL_QUORUM)
7200 RPM Disks
Big CPU - 64GB RAM
DC Analytics 1 Keyspace prod RF:3
(read LOCAL_QUORUM)
Keyspace analytics RF:3
(write LOCAL_QUORUM)
SSD Disks
Big CPU – 128 GB RAM
DC Analytics 2 Keyspace analytics RF:5
(read LOCAL_QUORUM)
7200 RPM Disks
Tiny CPU – 32 GB RAM
CASSANDRA FUTURE DEPLOYMENT
Page 68 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
DC Prod 1 partition/hour
Keyspace prod RF:3
(write LOCAL_QUORUM)
7200 RPM Disks
Big CPU - 64GB RAM
DC Analytics 1 Keyspace prod RF:3
(read LOCAL_QUORUM)
Keyspace analytics RF:3
(write LOCAL_QUORUM)
SSD Disks
Big CPU – 128 GB RAM
CASSANDRA’S NODE-BASED ARCHITECTURE
Page 69 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
BASIC WRITE PATH CONCEPT Page 70
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
BASIC READ PATH CONCEPT Page 71
l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg