41
DATASTAX C*OLLEGE CREDIT: DATA MODELLING FOR APACHE CASSANDRA Aaron Morton Apache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

C*ollege Credit: Data Modeling for Apache Cassandra

Embed Size (px)

DESCRIPTION

Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.

Citation preview

DATASTAX C*OLLEGE CREDIT:

DATA MODELLING FOR APACHE CASSANDRA

Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra

@aaronmortonwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

General GuidelinesAPI ChoiceExample

Cassandra is good at

reading data from a row in the order it is stored.

Typically an efficient data model will

denormalize data and use the storage engine order.

To create a good data model

understand the queries your application requires.

General GuidelinesAPI Choice

Example

Multiple API’s?

initially only a Thrift / RPC API, used by language specific

clients.

Multiple API’s...

Cassandra Query Language (CQL) started as a higher

level, declarative alternative.

Multiple API’s...

CQL 3 brings many changes. Currently in Beta in

Cassandra v1.1

CQL 3 uses

a Table Orientated, Schema Driven, Data Model.

(I said it had many changes.)

General GuidelinesAPI ChoiceExample

Twitter ClonePreviously done with Thrift at WDCNZ

“Hello @World #Cassandra - Apache

Cassandra in action”http://vimeo.com/49762233

Twitter clone...

using CQL 3 via the cqlsh tool.

bin/cqlsh -3

Queries?* Post Tweet to Followers

* Get Tweet by ID* List Tweets by User

* List Tweets in User Timeline* List Followers

Keyspace is

a namespace container.

Our Keyspace

CREATE KEYSPACE cass_college WITH strategy_class = 'NetworkTopologyStrategy'AND strategy_options:datacenter1 = 1;

Table is

a sparse collection of well known, ordered columns.

First Table

CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name));

Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo');

cqlsh:cass_college> select * from User; user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo

Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd');

cqlsh:cass_college> select * from User where user_name = 'bob'; user_name | password | real_name-----------+----------+----------- bob | pwd | null

Data Model (so far)

User

Data Model (so far)

CF / Value User

user_name Primary Key

Tweet TableCREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));

Tweet Table...cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred

Data Model (so far)

CF / Value User Tweet

user_name Primary Key Field

tweet_id Primary Key

UserTweets TableCREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));

UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_college> select * from UserTweets where user_name='fred';

user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name='fred' and tweet_id=1;

user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918);

cqlsh:cass_college> select * from UserTweets where user_name = 'fred'; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300

UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by tweet_id desc;

user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300

UserTimelineCREATE TABLE UserTimeline ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));

Data Model (so far)

CF / Value User Tweet User

TweetsUser

Timeline

user_name Primary Key Field Primary Key Primary Key

tweet_id Primary Key Primary KeyComponent

Primary KeyComponent

UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));

UserMetrics Table...cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred';cqlsh:cass_college> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1

Data Model (so far)

CF / Value User Tweet User

TweetsUser

Timeline User Metrics

user_name Primary Key Field Primary

KeyPrimary

KeyPrimary

Key

tweet_id Primary Key

Primary KeyComponent

Primary KeyComponent

RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));

CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));

RelationshipsINSERT INTO Following (user_name, following, timestamp)VALUES ('bob', 'fred', 1352247749161);INSERT INTO Followers (user_name, follower, timestamp)VALUES ('fred', 'bob', 1352247749161);

Relationshipscqlsh:cass_college> select * from Following; user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300

cqlsh:cass_college> select * from Followers; user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300

Data Model

CF / Value User Tweet User

TweetsUser

TimelineUser

MetricsFollows

Followers

user_name Primary Key Field Primary

KeyPrimary

KeyPrimary

KeyPrimary

KeyField

tweet_id Primary Key

Primary KeyComponent

Primary KeyComponent

Thanks.

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License