Transcript

DATASTAX C*OLLEGE CREDIT:

DATA MODELLING FOR APACHE CASSANDRA

Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra

@aaronmortonwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

General GuidelinesAPI ChoiceExample

Cassandra is good at

reading data from a row in the order it is stored.

Typically an efficient data model will

denormalize data and use the storage engine order.

To create a good data model

understand the queries your application requires.

General GuidelinesAPI Choice

Example

Multiple API’s?

initially only a Thrift / RPC API, used by language specific

clients.

Multiple API’s...

Cassandra Query Language (CQL) started as a higher

level, declarative alternative.

Multiple API’s...

CQL 3 brings many changes. Currently in Beta in

Cassandra v1.1

CQL 3 uses

a Table Orientated, Schema Driven, Data Model.

(I said it had many changes.)

General GuidelinesAPI ChoiceExample

Twitter ClonePreviously done with Thrift at WDCNZ

“Hello @World #Cassandra - Apache

Cassandra in action”http://vimeo.com/49762233

Twitter clone...

using CQL 3 via the cqlsh tool.

bin/cqlsh -3

Queries?* Post Tweet to Followers

* Get Tweet by ID* List Tweets by User

* List Tweets in User Timeline* List Followers

Keyspace is

a namespace container.

Our Keyspace

CREATE KEYSPACE cass_college WITH strategy_class = 'NetworkTopologyStrategy'AND strategy_options:datacenter1 = 1;

Table is

a sparse collection of well known, ordered columns.

First Table

CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name));

Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo');

cqlsh:cass_college> select * from User; user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo

Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd');

cqlsh:cass_college> select * from User where user_name = 'bob'; user_name | password | real_name-----------+----------+----------- bob | pwd | null

Data Model (so far)

User

Data Model (so far)

CF / Value User

user_name Primary Key

Tweet TableCREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));

Tweet Table...cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred

Data Model (so far)

CF / Value User Tweet

user_name Primary Key Field

tweet_id Primary Key

UserTweets TableCREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));

UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_college> select * from UserTweets where user_name='fred';

user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name='fred' and tweet_id=1;

user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918);

cqlsh:cass_college> select * from UserTweets where user_name = 'fred'; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by tweet_id desc;

user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTimelineCREATE TABLE UserTimeline ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));

Data Model (so far)

CF / Value User Tweet User

TweetsUser

Timeline

user_name Primary Key Field Primary Key Primary Key

tweet_id Primary Key Primary KeyComponent

Primary KeyComponent

UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));

UserMetrics Table...cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred';cqlsh:cass_college> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1

Data Model (so far)

CF / Value User Tweet User

TweetsUser

Timeline User Metrics

user_name Primary Key Field Primary

KeyPrimary

KeyPrimary

Key

tweet_id Primary Key

Primary KeyComponent

Primary KeyComponent

RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));

CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));

RelationshipsINSERT INTO Following (user_name, following, timestamp)VALUES ('bob', 'fred', 1352247749161);INSERT INTO Followers (user_name, follower, timestamp)VALUES ('fred', 'bob', 1352247749161);

Relationshipscqlsh:cass_college> select * from Following; user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300

cqlsh:cass_college> select * from Followers; user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300

Data Model

CF / Value User Tweet User

TweetsUser

TimelineUser

MetricsFollows

Followers

user_name Primary Key Field Primary

KeyPrimary

KeyPrimary

KeyPrimary

KeyField

tweet_id Primary Key

Primary KeyComponent

Primary KeyComponent

Thanks.

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License


Recommended