Download pdf - Cassandra - PHP

CassandraIntegrating Cassandra into your project

dinsdag 12 november 13

Maurits Lawende

• Work at Dutch Open Projects (DOP) since 2007

• Development and technical design for challenging Drupal sites

• Development of SaaS solutions in PHP & NodeJS


ToDoToDay

• Data versus information

• History and usage of Cassandra

• How to use Cassandra

• Developments


Data versus informationCelko, J. (1999). Data and databases


SQL is designed for informationDBMS knows how to use your data


SQL is designed for flexibilityNot even a single line on scalability


SQLnearly 40 years of experience


SQLNever designed for scalability


Alexa top 10• Google

• Facebook

• YouTube

• Yahoo

• Baidu

• Wikipedia

• QQ.com

• LinkedIn

• Live.com

• Twitter


Alexa top 10• Google (BigTable)

• Facebook (MySQL)

• YouTube (MySQL)

• Yahoo

• Baidu (HyperTable)

• Wikipedia (MySQL)

• QQ.com

• LinkedIn (Voldemort)

• Live.com

• Twitter (MySQL)


Cassandra users• Facebook (+ Redis & HBase & MySQL)

• Twitter (+ MySQL)

• Reddit (+ Postgres)

• Digg (+ Redis)

• Bit.ly (+ MongoDB)

• Netflix





• Digg (+ Redis)


• Netflix

Jeff Hammerbacher





• Digg (+ Redis)


• Netflix

Jeff Hammerbacherleft Facebook in 2008


Back to basicDon’t think SQL


Key/value storeEvolved towards tables


Just data

• No joins

• Limited sorting capabilities

• No aggregation, grouping, subqueries whatsoever


Schemaless

• Fixed <strike>tables</strike> column families, but;

• Dynamic column names


Operations in Cassandra 1.0

• CREATE KEYSPACE name

• USE name

• CREATE COLUMN FAMILY name

• DROP KEYSPACE name

• DROP COLUMN FAMILY name



• SET columnfamily[‘row’][‘column’] = ‘value’;

• GET columnfamily[‘row’]

• LIST columnfamily

• DEL columnfamily[‘row’]

• DEL columnfamily[‘row’][‘column’]



• post[‘uuid’][‘title’] = ‘First post!’;

• user[‘mau’][‘firstname’] = ‘Maurits’;

• user[‘mau’][‘lastname’] = ‘Lawende’;






post

uuid

user

mau

titleFirst post!

firstnameMaurits

lastnameLawende






sorted by rowkey, columnname (all ascending)




• post[‘uuid’][‘user’] = ‘mau’;







How to get a listof blogs by “mau”?







WHERE user = ‘mau’








Bad Request:No indexed columns present in

by-columns clause withEqual operator










sequal scansare rejected










Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY










Bad Request: Order by is currently only supportedon the clustered columns of the PRIMARY KEY

Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN.








ORDER BY date DESCLIMIT 10








ORDER BY date DESCLIMIT 10

only possible when user anddate is in primary key


Predictable performanceNo performance degradation after data growth






• user[‘mau’][‘post001’] = ‘uuid’;









any order and limit




• post[‘uuid’][‘user’] = ‘uuid’;




join




• post[‘uuid’][‘user’] = ‘uuid’;




join

no uuid IN (...) or OR’s





• user[‘mau’][‘post001:uuid’] = ‘First post!’;

• user[‘mau’][‘post002:uuid’] = ‘Second post!’;







only one query requiredto get user profile

with latest posts







64 KB 64 KB 2 GB

2 billion cells


Beauty?

• Dirty in the SQL world, but;

• It’s a best practice in Big Data

• Don’t think of it as a relational database

• No strict rules on how to use it, just push it to the limits



Each row is a snapshot of data meant to satisfy a given query, sort

of like a materialized view.


Storage in a cluster


Cluster structures


Master-slave


Master-master


Sharding


HDFS / GlusterFS


HyperTable


Dynamo


No master or single point of failureEvery node is (nearly) identical


Distribution and replication02^127


Distribution and replication










Client can connect to any node


Seed nodes

• Required for bootstrapping nodes

• Define 2 or 3 seed nodes per cluster


Extending the ring

• Assign a token for new node

• Configure seed node host

• Start Cassandra on new node


Extending the ring

• Assign a token for new node

• Configure seed node host

• Start Cassandra on new node


Consistency


Writing data

• Hinted handoff

• Write to commit log

• Write in memory

• Write to disk (together with timestamp)


Write consistency

• Choose from ANY, ONE, TWO, THREE, QUORUM, ALL

• QUORUM = floor((replication factor / 2) + 1)


Read consistency

• Choose from ONE, TWO, THREE, QUORUM, ALL

• Most recent copy is returned


Read repair

• Compares data with 2 other replica’s in the background

• Fixes inconsistent and missing data

• At 10% of all reads


Node repair

• Gradually compares all data in nodes with replica’s

• Required in conjunction with read repair to fix ‘forgotten deletes’


ACID theorem

• Atomic; completed successfully or entirely rolled back

• Consistent; transations never invalidates the database state

• Isolated; transactions are processed sequential

• Durable; completed actions are persistent


CAP theorem

• Consistency

• Availability

• Partition tolerance

Impossible to achieve all three:


Eventual consistencyNot guaranteed to be consistent, but becomes consistent later


Eventual consistency

• Best effort

• Consistency is not always more important than speed and scalability (doesn’t require locking)

• Configurable consistency level, but no transaction support


Surrogate keysSay bye to sequences



not consistent across cluster




counters are for counting




counters are for counting

Native support for uuid’sf47ac10b-58cc-4372-a567-0e02b2c3d479


Cassandra 1.2


Cassandra 1.2

• Not longer schemaless

• Introduced CQL3

• No wide tables anymore


Collections

• Lists

• Maps

• Sets


Lists

• user[‘mau’][‘posts’] = ‘uuid’;

• CREATE TABLE user ( username text PRIMARY KEY, posts list<uuid>);

• UPDATE user SET posts = posts + [‘uuid’]

• UPDATE user SET posts = [‘uuid’] + posts


Set

• CREATE TABLE user ( username text PRIMARY KEY, email set<text>);

• UPDATE user SET emails = emails + {‘[email protected]’}


Maps

• CREATE TABLE user ( username text PRIMARY KEY, attending map<timestamp,text>);

• UPDATE user SET attending[‘2013-11-12’] = ‘PHPMeetup’

• DELETE attending[‘2013-12-05’] FROM user


Limits on collections

• 64K

• Whole collection loaded in memory when reading / writing

• Not an alternative to wide tables!


Limits on collections

• 64K

• Whole collection loaded in memory when reading / writing

• Not an alternative to wide tables!

No size check in CQLSET list = list + [‘...’]


Wide tables in CQL3

• CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar);

• CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id))


Wide tables in CQL3



user_idmauuser_idmike

uuid:authoranneuuid:authordavid

uuid:bodyTweet from Anneuuid:bodyTweet from David


Wide tables in CQL3



user_idmauuser_idmike

uuid:authoranneuuid:authordavid

uuid:bodyTweet from Anneuuid:bodyTweet from David

For schemaless lovers:

CREATE TABLE name ( rowkey varchar, columnname varchar, value blob, PRIMARY KEY (rowkey, columnname));


Secondary index

• CREATE INDEX name ON table (column);

• High memory usage when used with high cardinality


Iteration

• SELECT * FROM users


Iteration

• SELECT * FROM users LIMIT 10 OFFSET 100

unpredictable performance


Iteration


• SELECT token(username), username, country, age FROM user


Iteration


• SELECT token(username), username, country, age FROM userWHERE token(username) > 23947239 LIMIT 10


Queries are always controlled by one node


Queries are always controlled by one node

Even if data from 100 nodes is involved


MapReduceOr just ‘MapRed’


MapReduce

• array_map

• array_reduce


map()

• Processes a subset of the data

• array_map(function($v) { return strtoupper($v); }, array('a', 'b'))


reduce()

• Merge results from the mapping function

• array_reduce(array(1, 2, 3), function($a, $b) { return $a + $b; });


MapReduce


MapReduce

map() map() map() map()

map() map() map() map()

map()map()map()map()


MapReduce


MapReduce


MapReduce


MapReduce


MapReduce


MapReduce


MapReduce


MapReduce

result


Wordcount$data = array(‘red green blue’, ‘orange blue’, ‘purple green’);

$data = array_map(function($v) { $words = array(); foreach (explode(' ', $v) as $word) $words[$word] = isset($words[$word]) ? $words[$word] + 1 : 1; return $words;}, $data);$data = array_reduce($data, function($a, $b) { foreach ($a as $word => $count) $b[$word] = isset($b[$word]) ? $b[$word] + $count : $count; return $b;}, array());

array(‘red’ => 1, ‘green’ => 2, ‘blue’ => 2, ‘orange’ => 1, ‘purple’ => 1)


ORDER BY value LIMIT 5$data = array(array(4,5,2), array(62,35,1), array(74,56,2,34));

$data = array_map(function($v) { sort($v); return array_slice($v, 0, 5);}, $data);$data = array_reduce($data, function($a, $b) { $v = array_merge($a, $b); sort($v); return array_slice($v, 0, 5);}, array());

array(1, 2, 2, 4, 5)


Remember

• Getting information is a bumpy road in big data

• Use MapRed to transform data into information


MapReduce

• No native support in Cassandra

• MapReduce possible with Hadoop (requires Java programming)


Pig

input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);

words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;filtered_words = FILTER words BY word MATCHES '\\w+';word_groups = GROUP filtered_words BY word;word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;ordered_word_count = ORDER word_count BY count DESC;

STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';


Hive

SELECT v['ip'], COUNT(1) AS cnt FROM www_access GROUP BY v['ip'] ORDER BY cnt DESC LIMIT 30


Pig and Hive

• Using MapReduce

• No(t very) predictable performance

• Good for analysis


Hack your own

• Not too difficult

• Data can be split into subsets by filtering on tokens

• Application must run on all MapRed nodes

• Probably better performance than Pig / Hive



Interfaces / protocols

• Thrift

• Binary protocol (1.2+)

• Gossip (internode communication)


Thrift

• Something like SOAP in a binary format

• Tool which generates libraries based on definition files

• Supports many languages (incl. PHP, JS, NodeJS, c, java, python, ruby.....)

• Also used by HyperTable, HBase, Accumulo and ElasticSearch

• Sole interface before 1.2


Thrift

• No support for collections


Binary protocol

• Recommended protocol for Cassandra 1.2

• Few client libraries available

• No binary connectors were available for PHPhttps://github.com/mauritsl/php-cassandra


https://github.com/mauritsl/php-cassandra

https://github.com/mauritsl/php-cassandra

php-cassandrarequire('lib/cassandra/Cassandra.php');use Cassandra\Connection as Cassandra;

$connection = new Cassandra('localhost', 'keyspace');

$rows = $connection->query('SELECT * FROM user');foreach ($rows as $row) { print $row->firstname; print $row->listfield[0];}

$rows->count();$rows->getColumns();


Scaling applications


Rule 1:Don’t ask for NoSQL drivers for a CMS


Cassandra does not fit all(same story for every NoSQL solution)


Every page (or API call) should only require a few (if not one) query


Static versus Dynamic data

• Static: information that doesn’t change very often

• I.e.: translations

• May go in a RDBMS or local storage (files?)

• Dynamic: many changes

• Changes must be visible on all nodes

• Use Cassandra


Local versus Global data

• Logging

• Separate logs per node

• Cache

• Sometimes no need to share cache between nodes

• Statistics

• Can be kept local for a limited time


Local versus Global data

• Sessions

• Dependent on session stickiness


Caching

• Memcache is recommended for local cache

• Cassandra can be used for global cache

• Has a TTL featureINSERT INTO ... (...) VALUES (...) USING TTL 86400


What about files?

• Use Hadoop Distributed File System (HDFS) or GlusterFS


What about files?

• Use Hadoop Distributed File System (HDFS) or GlusterFS

• Or use Cassandra


What about files?

• Split files in chunks to avoid hotspots and save the heap

• Not uncommon to have files in Cassandra

• github.com/Netflix/astyanax

• GB’s are ok, but do not store TB’s


Maximum size of cluster?

• No satisfactory answer

• Probably more dependent on network equipment

• Rack awareness helps here

• Facebook: 150 node cluster, 50TB data (2010)

• Easou: 400 node cluster, 300TB data (300 million images)


Minimum size of a cluster?

• Can run on a single node

• 4GB RAM recommended

• Runs fine on 1GB RAM


Minimum size of a cluster?

• Can run on a single node

• 4GB RAM recommended

• Runs fine on 1GB RAM“hot data” should fit in RAM


Installing Cassandra

• Install JDKOracle Java recommended but OpenJDK works ok

• Add Cassandra repository

• apt-get install cassandra

• Set listen and seed address (IP address of node and seed)

• (Re)start Cassandra


Last words...


Data versus informationData structure is naturally responsive for information


Data versus informationData structure is naturally responsive for information

predictable performance


History and usageJeff Hammerbacher


How to use itSchema design, CQL3 and limits


DevelopmentsCQL3 and binary protocol


Thank you!


Questions?