Iasi Java – Data Modelling in Cassandra

© 2015 Ness SES. All Rights Reserved1 © 2015 Ness SES. All Rights Reserved1

NessSoftwareEngineeringServices

Data Modelling in Cassandra

Moshe KrancAug 26, 2015

© 2015 Ness SES. All Rights Reserved2

• Moshe Kranc (pronounced Krantz)• In numbers:

– 37 years in High Tech– 3 exits, 7 patents, 1 Emmy Award– 1 wife, 5 sons, 1 daughter, 30 years of marriage– 1 book: The Hasidic Masters’ Guide to

Management– 1 arrest (1980 in Kishinev)

• I invented the Internet, so I’ll do whatever I want there

• Expert in Big Data– Hadoop since 2007– Cassandra since 2010

• Chief Technology Officer at Ness

Who Am I?


Column-family oriented DB’s

• Descendants of Google’s BigTable• A column is a key-value pair

– Supports unstructured data• Store groups of columns together

– A given row ID can appear multiple times, once for each column family– Different than columnar format, where each column is stored separately

• Rows can have any number of any column• Schema-less• Leading examples: Cassandra, HBase• Benefits:

– Can write millions of rows per minute– Flexible column format– Lightening-fast key-value retrieval

• Limitations– Transactions– JOIN’s– Ad hoc queries over multiple columns

• Lacks a mature query compiler


Step 1: Make Sure C* is the right choice

• Pro’s:– Very fast writes– Fast random reads– Very scalable

• Con’s:– No transactions– No JOIN’s– No nested queries– No aggregation framework– Terrible for range scans on primary key– Terrible for ad hoc queries– Terrible for queues


The Conceptual Model

• Don’t think: relational table. Instead, think: nested sorted map– A row key maps to a row, the column keys in a row are sorted, can

be nested• A map gives efficient key lookup, and the sorted column keys gives

efficient range scans• Valueless columns: The column key can itself hold a value

– That’s how you sort column values• Wide rows: The number of column keys is (virtually) unbounded


Data Modelling in C*

• Model column families around query patterns– Secondary indexes and complex SQL (JOIN, ORDER BY,

GROUP BY) will not work as they do in your favorite RDBMS– Identify the most frequent query patterns and isolate the less

frequent– Consider which queries are sensitive to latency and which are not

• But, start your design with entities and relationships– It is important to understand and start with entities / relationships,

then evolve the model around query patterns, by de-normalizing and duplicating

• Remember the model of the nested sorted map, and think how you can organize data into that map in order to satisfy your query requirements, e.g., fast look-up, ordering, grouping, filtering– When you hold a hammer, everything looks like a nail


De-normalization

• De-normalize and duplicate for read performance• But don’t de-normalize if you don’t need to. It’s about finding the right

balance• Remember:

– Speed is important– Disk is cheap

• The best analogy for this process is the Oracle materialized view. Instead of just storing pointers to data, a materialized view makes a true copy of the data. You should expect to have multiple copies of (part of) your data


Row and Column Keys

• Make sure row key and column key are unique– Otherwise, data could get accidentally overwritten

• There is no in-place update in C* - it is always an upsert • If you accidentally insert data with an existing row or column

key, the previous value will be silently overwritten without any error. The change will not be versioned, and the old data will be gone

– (A secret: the data is still there until the next compaction – you just can’t read it)

• Keep the column name short– Because it is stored repeatedly– Max column key (and row key) size = 64KB

• Storing values in column names is perfectly OK– Motivation: column names are stored in sort order, but column

values are not


Wide Rows

• Leverage wide rows for ordering, grouping and filtering– When actual data is stored in column names (to sort the data), we

inevitably end up with wide rows• Benefits of wide rows:

– Ordering: Enables efficient range scans on column values, efficient search for a specific column value

– Grouping: If data is queried together, you can group that data in a single wide row that can be read efficiently as part of a single query.

• Example: for tracking time series data, group data by date/hour/machine/event in a single wide row, with each column containing granular data or roll-ups

– Filtering: Wide row column families are heavily used (with composite columns) to build custom indexes

• But not too wide, as a row is never split across nodes– All of the traffic related to one row in handled by only one node (actually

the set of nodes that hold the row’s replica)


Row Keys and Hot Spots

• Choose the proper row key – it’s your “shard key”– Otherwise you’ll end up with hot spots, even using

RandomPartitioner• Example: storing time series data which is retrieved based on hour• Idea 1: Row key = the hour: a terrible idea

– All the writes in a given hour will go to a single node holding the row for the current hour – a hotspot

• Idea 2: Row key = the minute: no improvement– Only one node will be writing during whatever duration you pick!– As time progresses the hot spot moves, but it never goes away

• This is a recurring dilemma in C* data modelling, between taking advantage of column ordering and avoiding hotspots


Row Keys and Hot Spots

• Idea 3: Add something else to the row key, e.g., event type, machine id– Whatever is appropriate for your use case– But what if you have nothing else to add, or you absolutely need

time period as the only row key?• Idea 4: Manually split row keys based on number of nodes in the C*

cluster– E.g., “yyyymmddhh|1”, “yyyymmddhh|2”, …– For an hour window, each node will now evenly handle the writes

in round-robin– Issue: Reading data for a given hour will require multi-get from all

the physical nodes followed by a merge in the client application– Issue: What if the number of nodes in the cluster changes?

• Proposed solution: store the cluster size over time in a C* table


Idempotency

• Idempotency = the ability to repeatedly apply the same operation without affecting data consistency

• Usually better to keep a record of what happened when, rather than storing the current value– Prevents race conditions

• Counter-example: Status value stored as a scalar– E.g., status = ready– Sensitive to: order, replay

• Better: Store status value together with timestamp– status|20150428043206 = ready– status|20150428043207 = pending– Better yet: concatenate with UUID

• Design the data model so that operations are idempotent– Or, make sure your use case can live with inaccuracies or, that

inaccuracies can be corrected eventually


Column Names and Values: Examples

• Store multiple values for the same column– E.g., email addresses– Store the column value as a sub-column name

• mail|[email protected]=null• mail|[email protected] = null

• Maintain all versions of a column’s values (e.g., for idempotency)– E.g., status– Use timeuuid as a sub-column name

• status|20150428043206 = ready• status|20150428043207 = pending

• Store multiple fields for an object– E.g., for a given genre, store all the books (title, author, publisher), sorted by title– Row key = <genre>, e.g., “SciFi”– Column key = <title>|author– Column key = <title>|publisher– Example: “Fiction”: “Gone With the Wind”|author = “Margaret Mitchell”, “Gone

With the Wind”|publisher= “Pocket Books”

mailto:mail%[email protected]%3Dnull

mailto:mail%[email protected]


Example 1: Users “Like” Items

• Support queries:– Get user by userId– Get item by itemID– Get all the items(id,title) that a particular

user likes– Get all the users(id,name) who like a

particular item


Option 1: Exact Replica of Relational Model

• Identical to relational model• No easy way to query all the items that a particular user likes, or

all the users who like a particular item– Inefficient due to lack of JOIN’s– Requires a full table scan of User_Item_Like, followed by(for

each User_Item_Like row), reading one User row and one Item row


Option 2: Normalized entities with custom indexes

• We can easily query all the items that a particular user likes, using Item_By_User, and all the users who like a particular item, using User_By_Item

• But, we always want to get the item title in addition to the item id when we query items liked by a particular user.– In this model, we first need to query Item_By_User to get all

the item ids that a given user likes, and then, for each item id, we need to query Item to get the title


Option 3: Normalized entities with de-normalization into custom indexes

• Title and username are de-normalized in User_By_Item and Item_By_User respectively

• This allows us to efficiently query (by reading a single row) all the item titles liked by a given user, and all the user names who like a given item

• Plus one read to get full information about the User/Item


Option 4: partially de-normalized entities

• More efficient: one read instead o two to get full information about a User/Item

• But, much more repeated data: much wider rows, much harder to maintain

• Probably a step too far in de-normalization – a lot more data just to save one read per query

• Preference: Option 3


Example 2: Book Rating Site

• Users can add books (bookid, author, title, rank (0.0 – 10.0)), comments (id, user, text, time), and tag them

• The application needs to support the following user operations:– Add books– Add comments for books– Add tags for books– List books (bookid, title, author) sorted by rank– List all books (bookid, title, author) tagged with a given tag– List the comments (text, user) for a given bookid, sorted by date


Strategy: Books

• Store each book in a distinct row in the Books table– So we read all info about a book in one shot– Row key = bookid– Columns (key, value) for: author, title, rank


Strategy: Comments

• To retrieve all comments for a book: Store comments as columns in the book’s row, sorted by timestamp– In Books table– Column key = “comment|<timeuuid>”

• Better: “cmt|<timeuuid>”– Column value = the comment text

• Alternative: a separate Comments table, where row key =– Comment text: but, unlike tags, no one searches or sorts based

on comment text– bookid: might as well be part of the Books table


Strategy: Tags

• To retrieve all tags for a book: Store tags as columns in a book’s row, sorted by timestamp– Column key = “tag|<timeuuid>”– Column value = the tag– (Alternative: a separate Tags table, where row key = book ID –

might as well be part of the Books table)• To retrieve all books for a given tag: A separate TagBooks table, with

one row for each tag– Row key = tag– Column key = “<bookID>|title”, column value = the title– Column key = “<bookID>|author”, column value = the author– (Alternative: just store book ID in the column, use it to look up the

rest of the book info in Books – inefficient)


… And then along came CQL

• Cassandra Query Language: An SQL–like language to query Cassandra– E.g., CREATE / DROP / ALTER TABLE / SELECT / INSERT

• Motivation: C* code is hard to write, easy to fall into performance traps– CQL is a best-practices Cassandra interface and hides the

messy details– Limited predicates - Attempts to prevent bad queries

• But, you can still get into trouble!• Motivation: C* makes it very hard to understand the data model

without reading code– CQL is a reintroduction of schema so that you don’t have

to read code to understand the data model– CQL creates a common language so that details of the data

model can be easily communicated


Composite Primary Key

• The Primary Key– The key uniquely identifies a row.– A composite primary key consists of:

• A partition key• One or more clustering columns

• E.g. PRIMARY KEY (partition key, cluster columns, ...)• The partition key determines on which node the partition resides• Data is ordered in cluster column order within the partition


Composite Primary Key

CREATE TABLE sporty_league (team_name varchar,player_name varchar,jersey int,PRIMARY KEY (team_name, player_name)

);


Composite Partition Key

CREATE TABLE cities (city_name varchar,state varcharPRIMARY KEY ((city_name,state))

);

• Each city gets it own partition


CQL/Cassandra Mapping: Simple Keys


CQL/Cassandra Mapping: Compound Keys


CQL/Cassandra Mapping: Maps


The Bottom Line

• C* has a surprisingly small bag of tricks…• … that can be used to solve a surprisingly large set of problems• ….(so long as you can think sideways)

• Don’t let the CQL façade fool you


More examples• Real time analytics:

http://blog.markedup.com/2013/03/Cassandra-real-time-analytics/• Financial time series:

http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013• Music service: http://

docs.datastax.com/en/cql/3.0/cql/ddl/ddl_music_service_c.html• Following-followers:

https://blog.safaribooksonline.com/2012/12/11/modeling-data-in-cassandra/

• Tweets: http://www.slideshare.net/jericevans/cassandra-by-example-data-modelling-with-cql3

• http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling

• http://clojurecassandra.info/articles/data_modelling.html• http://www.slideshare.net/planetcassandra/data-modeling-with-travis-

price• http://htraining.s3.amazonaws.com/cassandra-training.pptx

http://blog.markedup.com/2013/03/Cassandra-real-time-analytics/

http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013

http://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_music_service_c.html






http://www.slideshare.net/jericevans/cassandra-by-example-data-modelling-with-cql3



http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling

http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling

http://clojurecassandra.info/articles/data_modelling.html

http://www.slideshare.net/planetcassandra/data-modeling-with-travis-price

http://www.slideshare.net/planetcassandra/data-modeling-with-travis-price

http://htraining.s3.amazonaws.com/cassandra-training.pptx

Notes--------------------------------------------------------------

© 2014, Confidential and Proprietary Information, All Rights Reserved. www.ness.com Page. 35

תודהधन्यवाद

dankeďakujem

Thank you

mulțumesc

Documents

Iasi Java – Data Modelling in Cassandra