DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE CASSANDRA
Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra
@aaronmortonwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
Multiple API’s...
Cassandra Query Language (CQL) started as a higher
level, declarative alternative.
Twitter ClonePreviously done with Thrift at WDCNZ
“Hello @World #Cassandra - Apache
Cassandra in action”http://vimeo.com/49762233
Queries?* Post Tweet to Followers
* Get Tweet by ID* List Tweets by User
* List Tweets in User Timeline* List Followers
Our Keyspace
CREATE KEYSPACE cass_college WITH strategy_class = 'NetworkTopologyStrategy'AND strategy_options:datacenter1 = 1;
First Table
CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name));
Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo');
cqlsh:cass_college> select * from User; user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo
Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd');
cqlsh:cass_college> select * from User where user_name = 'bob'; user_name | password | real_name-----------+----------+----------- bob | pwd | null
Tweet TableCREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));
Tweet Table...cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
UserTweets TableCREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from UserTweets where user_name='fred';
user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name='fred' and tweet_id=1;
user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918);
cqlsh:cass_college> select * from UserTweets where user_name = 'fred'; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by tweet_id desc;
user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTimelineCREATE TABLE UserTimeline ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
Data Model (so far)
CF / Value User Tweet User
TweetsUser
Timeline
user_name Primary Key Field Primary Key Primary Key
tweet_id Primary Key Primary KeyComponent
Primary KeyComponent
UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));
UserMetrics Table...cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred';cqlsh:cass_college> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1
Data Model (so far)
CF / Value User Tweet User
TweetsUser
Timeline User Metrics
user_name Primary Key Field Primary
KeyPrimary
KeyPrimary
Key
tweet_id Primary Key
Primary KeyComponent
Primary KeyComponent
RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));
CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));
RelationshipsINSERT INTO Following (user_name, following, timestamp)VALUES ('bob', 'fred', 1352247749161);INSERT INTO Followers (user_name, follower, timestamp)VALUES ('fred', 'bob', 1352247749161);
Relationshipscqlsh:cass_college> select * from Following; user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300
cqlsh:cass_college> select * from Followers; user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300
Data Model
CF / Value User Tweet User
TweetsUser
TimelineUser
MetricsFollows
Followers
user_name Primary Key Field Primary
KeyPrimary
KeyPrimary
KeyPrimary
KeyField
tweet_id Primary Key
Primary KeyComponent
Primary KeyComponent
Aaron Morton@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License