CassandraIntegrating Cassandra into your project
dinsdag 12 november 13
Maurits Lawende
• Work at Dutch Open Projects (DOP) since 2007
• Development and technical design for challenging Drupal sites
• Development of SaaS solutions in PHP & NodeJS
dinsdag 12 november 13
ToDoToDay
• Data versus information
• History and usage of Cassandra
• How to use Cassandra
• Developments
dinsdag 12 november 13
Data versus informationCelko, J. (1999). Data and databases
dinsdag 12 november 13
SQL is designed for informationDBMS knows how to use your data
dinsdag 12 november 13
SQL is designed for flexibilityNot even a single line on scalability
dinsdag 12 november 13
SQLnearly 40 years of experience
dinsdag 12 november 13
SQLNever designed for scalability
dinsdag 12 november 13
Alexa top 10• Google
• YouTube
• Yahoo
• Baidu
• Wikipedia
• QQ.com
• Live.com
dinsdag 12 november 13
Alexa top 10• Google (BigTable)
• Facebook (MySQL)
• YouTube (MySQL)
• Yahoo
• Baidu (HyperTable)
• Wikipedia (MySQL)
• QQ.com
• LinkedIn (Voldemort)
• Live.com
• Twitter (MySQL)
dinsdag 12 november 13
Cassandra users• Facebook (+ Redis & HBase & MySQL)
• Twitter (+ MySQL)
• Reddit (+ Postgres)
• Digg (+ Redis)
• Bit.ly (+ MongoDB)
• Netflix
dinsdag 12 november 13
Cassandra users• Facebook (+ Redis & HBase & MySQL)
• Twitter (+ MySQL)
• Reddit (+ Postgres)
• Digg (+ Redis)
• Bit.ly (+ MongoDB)
• Netflix
Jeff Hammerbacher
dinsdag 12 november 13
Cassandra users• Facebook (+ Redis & HBase & MySQL)
• Twitter (+ MySQL)
• Reddit (+ Postgres)
• Digg (+ Redis)
• Bit.ly (+ MongoDB)
• Netflix
Jeff Hammerbacherleft Facebook in 2008
dinsdag 12 november 13
Back to basicDon’t think SQL
dinsdag 12 november 13
Key/value storeEvolved towards tables
dinsdag 12 november 13
Just data
• No joins
• Limited sorting capabilities
• No aggregation, grouping, subqueries whatsoever
dinsdag 12 november 13
Schemaless
• Fixed <strike>tables</strike> column families, but;
• Dynamic column names
dinsdag 12 november 13
Operations in Cassandra 1.0
• CREATE KEYSPACE name
• USE name
• CREATE COLUMN FAMILY name
• DROP KEYSPACE name
• DROP COLUMN FAMILY name
dinsdag 12 november 13
Operations in Cassandra 1.0
• SET columnfamily[‘row’][‘column’] = ‘value’;
• GET columnfamily[‘row’]
• LIST columnfamily
• DEL columnfamily[‘row’]
• DEL columnfamily[‘row’][‘column’]
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘lastname’] = ‘Lawende’;
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘lastname’] = ‘Lawende’;
post
uuid
user
mau
titleFirst post!
firstnameMaurits
lastnameLawende
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘lastname’] = ‘Lawende’;
sorted by rowkey, columnname (all ascending)
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
Bad Request:No indexed columns present in
by-columns clause withEqual operator
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
Bad Request:No indexed columns present in
by-columns clause withEqual operator
sequal scansare rejected
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
Bad Request:No indexed columns present in
by-columns clause withEqual operator
Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
Bad Request:No indexed columns present in
by-columns clause withEqual operator
Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY
Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN.
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
ORDER BY date DESCLIMIT 10
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
How to get a listof blogs by “mau”?
WHERE user = ‘mau’
ORDER BY date DESCLIMIT 10
only possible when user anddate is in primary key
dinsdag 12 november 13
Predictable performanceNo performance degradation after data growth
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001’] = ‘uuid’;
• user[‘mau’][‘post002’] = ‘uuid’;
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘mau’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001’] = ‘uuid’;
• user[‘mau’][‘post002’] = ‘uuid’;
any order and limit
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘uuid’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001’] = ‘uuid’;
• user[‘mau’][‘post002’] = ‘uuid’;
join
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• post[‘uuid’][‘user’] = ‘uuid’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001’] = ‘uuid’;
• user[‘mau’][‘post002’] = ‘uuid’;
join
no uuid IN (...) or OR’s
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001:uuid’] = ‘First post!’;
• user[‘mau’][‘post002:uuid’] = ‘Second post!’;
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001:uuid’] = ‘First post!’;
• user[‘mau’][‘post002:uuid’] = ‘Second post!’;
only one query requiredto get user profile
with latest posts
dinsdag 12 november 13
Operations in Cassandra 1.0
• post[‘uuid’][‘title’] = ‘First post!’;
• user[‘mau’][‘firstname’] = ‘Maurits’;
• user[‘mau’][‘post001:uuid’] = ‘First post!’;
• user[‘mau’][‘post002:uuid’] = ‘Second post!’;
64 KB 64 KB 2 GB
2 billion cells
dinsdag 12 november 13
Beauty?
• Dirty in the SQL world, but;
• It’s a best practice in Big Data
• Don’t think of it as a relational database
• No strict rules on how to use it, just push it to the limits
dinsdag 12 november 13
dinsdag 12 november 13
Each row is a snapshot of data meant to satisfy a given query, sort
of like a materialized view.
dinsdag 12 november 13
Storage in a cluster
dinsdag 12 november 13
Cluster structures
dinsdag 12 november 13
Master-slave
dinsdag 12 november 13
Master-master
dinsdag 12 november 13
Sharding
dinsdag 12 november 13
HDFS / GlusterFS
dinsdag 12 november 13
HyperTable
dinsdag 12 november 13
Dynamo
dinsdag 12 november 13
No master or single point of failureEvery node is (nearly) identical
dinsdag 12 november 13
Distribution and replication02^127
dinsdag 12 november 13
Distribution and replication
dinsdag 12 november 13
Distribution and replication
dinsdag 12 november 13
Distribution and replication
dinsdag 12 november 13
Distribution and replication
dinsdag 12 november 13
Distribution and replication
dinsdag 12 november 13
Client can connect to any node
dinsdag 12 november 13
Seed nodes
• Required for bootstrapping nodes
• Define 2 or 3 seed nodes per cluster
dinsdag 12 november 13
Extending the ring
• Assign a token for new node
• Configure seed node host
• Start Cassandra on new node
dinsdag 12 november 13
Extending the ring
• Assign a token for new node
• Configure seed node host
• Start Cassandra on new node
dinsdag 12 november 13
Consistency
dinsdag 12 november 13
Writing data
• Hinted handoff
• Write to commit log
• Write in memory
• Write to disk (together with timestamp)
dinsdag 12 november 13
Write consistency
• Choose from ANY, ONE, TWO, THREE, QUORUM, ALL
• QUORUM = floor((replication factor / 2) + 1)
dinsdag 12 november 13
Read consistency
• Choose from ONE, TWO, THREE, QUORUM, ALL
• Most recent copy is returned
dinsdag 12 november 13
Read repair
• Compares data with 2 other replica’s in the background
• Fixes inconsistent and missing data
• At 10% of all reads
dinsdag 12 november 13
Node repair
• Gradually compares all data in nodes with replica’s
• Required in conjunction with read repair to fix ‘forgotten deletes’
dinsdag 12 november 13
ACID theorem
• Atomic; completed successfully or entirely rolled back
• Consistent; transations never invalidates the database state
• Isolated; transactions are processed sequential
• Durable; completed actions are persistent
dinsdag 12 november 13
CAP theorem
• Consistency
• Availability
• Partition tolerance
Impossible to achieve all three:
dinsdag 12 november 13
Eventual consistencyNot guaranteed to be consistent, but becomes consistent later
dinsdag 12 november 13
Eventual consistency
• Best effort
• Consistency is not always more important than speed and scalability (doesn’t require locking)
• Configurable consistency level, but no transaction support
dinsdag 12 november 13
Surrogate keysSay bye to sequences
dinsdag 12 november 13
Surrogate keysSay bye to sequences
not consistent across cluster
dinsdag 12 november 13
Surrogate keysSay bye to sequences
not consistent across cluster
counters are for counting
dinsdag 12 november 13
Surrogate keysSay bye to sequences
not consistent across cluster
counters are for counting
Native support for uuid’sf47ac10b-58cc-4372-a567-0e02b2c3d479
dinsdag 12 november 13
Cassandra 1.2
dinsdag 12 november 13
Cassandra 1.2
• Not longer schemaless
• Introduced CQL3
• No wide tables anymore
dinsdag 12 november 13
Collections
• Lists
• Maps
• Sets
dinsdag 12 november 13
Lists
• user[‘mau’][‘posts’] = ‘uuid’;
• CREATE TABLE user ( username text PRIMARY KEY, posts list<uuid>);
• UPDATE user SET posts = posts + [‘uuid’]
• UPDATE user SET posts = [‘uuid’] + posts
dinsdag 12 november 13
Set
• CREATE TABLE user ( username text PRIMARY KEY, email set<text>);
• UPDATE user SET emails = emails + {‘[email protected]’}
dinsdag 12 november 13
Maps
• CREATE TABLE user ( username text PRIMARY KEY, attending map<timestamp,text>);
• UPDATE user SET attending[‘2013-11-12’] = ‘PHPMeetup’
• DELETE attending[‘2013-12-05’] FROM user
dinsdag 12 november 13
Limits on collections
• 64K
• Whole collection loaded in memory when reading / writing
• Not an alternative to wide tables!
dinsdag 12 november 13
Limits on collections
• 64K
• Whole collection loaded in memory when reading / writing
• Not an alternative to wide tables!
No size check in CQLSET list = list + [‘...’]
dinsdag 12 november 13
Wide tables in CQL3
• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);
• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))
dinsdag 12 november 13
Wide tables in CQL3
• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);
• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))
user_idmauuser_idmike
uuid:authoranneuuid:authordavid
uuid:bodyTweet from Anneuuid:bodyTweet from David
dinsdag 12 november 13
Wide tables in CQL3
• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);
• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))
user_idmauuser_idmike
uuid:authoranneuuid:authordavid
uuid:bodyTweet from Anneuuid:bodyTweet from David
For schemaless lovers:
CREATE TABLE name ( rowkey varchar, columnname varchar, value blob, PRIMARY KEY (rowkey, columnname));
dinsdag 12 november 13
Secondary index
• CREATE INDEX name ON table (column);
• High memory usage when used with high cardinality
dinsdag 12 november 13
Iteration
• SELECT * FROM users
dinsdag 12 november 13
Iteration
• SELECT * FROM users LIMIT 10 OFFSET 100
unpredictable performance
dinsdag 12 november 13
Iteration
• SELECT * FROM users
• SELECT token(username), username, country, age FROM user
dinsdag 12 november 13
Iteration
• SELECT * FROM users
• SELECT token(username), username, country, age FROM userWHERE token(username) > 23947239 LIMIT 10
dinsdag 12 november 13
Queries are always controlled by one node
dinsdag 12 november 13
Queries are always controlled by one node
Even if data from 100 nodes is involved
dinsdag 12 november 13
MapReduceOr just ‘MapRed’
dinsdag 12 november 13
MapReduce
• array_map
• array_reduce
dinsdag 12 november 13
map()
• Processes a subset of the data
• array_map(function($v) { return strtoupper($v); }, array('a', 'b'))
dinsdag 12 november 13
reduce()
• Merge results from the mapping function
• array_reduce(array(1, 2, 3), function($a, $b) { return $a + $b; });
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
map() map() map() map()
map() map() map() map()
map()map()map()map()
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
dinsdag 12 november 13
MapReduce
result
dinsdag 12 november 13
Wordcount$data = array(‘red green blue’, ‘orange blue’, ‘purple green’);
$data = array_map(function($v) { $words = array(); foreach (explode(' ', $v) as $word) $words[$word] = isset($words[$word]) ? $words[$word] + 1 : 1; return $words;}, $data);$data = array_reduce($data, function($a, $b) { foreach ($a as $word => $count) $b[$word] = isset($b[$word]) ? $b[$word] + $count : $count; return $b;}, array());
array(‘red’ => 1, ‘green’ => 2, ‘blue’ => 2, ‘orange’ => 1, ‘purple’ => 1)
dinsdag 12 november 13
ORDER BY value LIMIT 5$data = array(array(4,5,2), array(62,35,1), array(74,56,2,34));
$data = array_map(function($v) { sort($v); return array_slice($v, 0, 5);}, $data);$data = array_reduce($data, function($a, $b) { $v = array_merge($a, $b); sort($v); return array_slice($v, 0, 5);}, array());
array(1, 2, 2, 4, 5)
dinsdag 12 november 13
Remember
• Getting information is a bumpy road in big data
• Use MapRed to transform data into information
dinsdag 12 november 13
MapReduce
• No native support in Cassandra
• MapReduce possible with Hadoop (requires Java programming)
dinsdag 12 november 13
Pig
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;filtered_words = FILTER words BY word MATCHES '\\w+';word_groups = GROUP filtered_words BY word;word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';
dinsdag 12 november 13
Hive
SELECT v['ip'], COUNT(1) AS cnt FROM www_access GROUP BY v['ip'] ORDER BY cnt DESC LIMIT 30
dinsdag 12 november 13
Pig and Hive
• Using MapReduce
• No(t very) predictable performance
• Good for analysis
dinsdag 12 november 13
Hack your own
• Not too difficult
• Data can be split into subsets by filtering on tokens
• Application must run on all MapRed nodes
• Probably better performance than Pig / Hive
dinsdag 12 november 13
dinsdag 12 november 13
Interfaces / protocols
• Thrift
• Binary protocol (1.2+)
• Gossip (internode communication)
dinsdag 12 november 13
Thrift
• Something like SOAP in a binary format
• Tool which generates libraries based on definition files
• Supports many languages (incl. PHP, JS, NodeJS, c, java, python, ruby.....)
• Also used by HyperTable, HBase, Accumulo and ElasticSearch
• Sole interface before 1.2
dinsdag 12 november 13
Thrift
• No support for collections
dinsdag 12 november 13
Binary protocol
• Recommended protocol for Cassandra 1.2
• Few client libraries available
• No binary connectors were available for PHPhttps://github.com/mauritsl/php-cassandra
dinsdag 12 november 13
php-cassandrarequire('lib/cassandra/Cassandra.php');use Cassandra\Connection as Cassandra;
$connection = new Cassandra('localhost', 'keyspace');
$rows = $connection->query('SELECT * FROM user');foreach ($rows as $row) { print $row->firstname; print $row->listfield[0];}
$rows->count();$rows->getColumns();
dinsdag 12 november 13
Scaling applications
dinsdag 12 november 13
Rule 1:Don’t ask for NoSQL drivers for a CMS
dinsdag 12 november 13
Cassandra does not fit all(same story for every NoSQL solution)
dinsdag 12 november 13
Every page (or API call) should only require a few (if not one) query
dinsdag 12 november 13
Static versus Dynamic data
• Static: information that doesn’t change very often
• I.e.: translations
• May go in a RDBMS or local storage (files?)
• Dynamic: many changes
• Changes must be visible on all nodes
• Use Cassandra
dinsdag 12 november 13
Local versus Global data
• Logging
• Separate logs per node
• Cache
• Sometimes no need to share cache between nodes
• Statistics
• Can be kept local for a limited time
dinsdag 12 november 13
Local versus Global data
• Sessions
• Dependent on session stickiness
dinsdag 12 november 13
Caching
• Memcache is recommended for local cache
• Cassandra can be used for global cache
• Has a TTL featureINSERT INTO ... (...) VALUES (...) USING TTL 86400
dinsdag 12 november 13
What about files?
• Use Hadoop Distributed File System (HDFS) or GlusterFS
dinsdag 12 november 13
What about files?
• Use Hadoop Distributed File System (HDFS) or GlusterFS
• Or use Cassandra
dinsdag 12 november 13
What about files?
• Split files in chunks to avoid hotspots and save the heap
• Not uncommon to have files in Cassandra
• github.com/Netflix/astyanax
• GB’s are ok, but do not store TB’s
dinsdag 12 november 13
Maximum size of cluster?
• No satisfactory answer
• Probably more dependent on network equipment
• Rack awareness helps here
• Facebook: 150 node cluster, 50TB data (2010)
• Easou: 400 node cluster, 300TB data (300 million images)
dinsdag 12 november 13
Minimum size of a cluster?
• Can run on a single node
• 4GB RAM recommended
• Runs fine on 1GB RAM
dinsdag 12 november 13
Minimum size of a cluster?
• Can run on a single node
• 4GB RAM recommended
• Runs fine on 1GB RAM“hot data” should fit in RAM
dinsdag 12 november 13
Installing Cassandra
• Install JDKOracle Java recommended but OpenJDK works ok
• Add Cassandra repository
• apt-get install cassandra
• Set listen and seed address (IP address of node and seed)
• (Re)start Cassandra
dinsdag 12 november 13
Last words...
dinsdag 12 november 13
Data versus informationData structure is naturally responsive for information
dinsdag 12 november 13
Data versus informationData structure is naturally responsive for information
predictable performance
dinsdag 12 november 13
History and usageJeff Hammerbacher
dinsdag 12 november 13
How to use itSchema design, CQL3 and limits
dinsdag 12 november 13
DevelopmentsCQL3 and binary protocol
dinsdag 12 november 13
Thank you!
dinsdag 12 november 13
Questions?
dinsdag 12 november 13