Upload
doandang
View
214
Download
0
Embed Size (px)
Citation preview
© 2015 Ness SES. All Rights Reserved1 © 2015 Ness SES. All Rights Reserved1
NessSoftwareEngineeringServices
Data Modelling in Cassandra
Moshe KrancAug 26, 2015
© 2015 Ness SES. All Rights Reserved2
• Moshe Kranc (pronounced Krantz)• In numbers:
– 37 years in High Tech– 3 exits, 7 patents, 1 Emmy Award– 1 wife, 5 sons, 1 daughter, 30 years of marriage– 1 book: The Hasidic Masters’ Guide to
Management– 1 arrest (1980 in Kishinev)
• I invented the Internet, so I’ll do whatever I want there
• Expert in Big Data– Hadoop since 2007– Cassandra since 2010
• Chief Technology Officer at Ness
Who Am I?
© 2015 Ness SES. All Rights Reserved3
Column-family oriented DB’s
• Descendants of Google’s BigTable• A column is a key-value pair
– Supports unstructured data• Store groups of columns together
– A given row ID can appear multiple times, once for each column family– Different than columnar format, where each column is stored separately
• Rows can have any number of any column• Schema-less• Leading examples: Cassandra, HBase• Benefits:
– Can write millions of rows per minute– Flexible column format– Lightening-fast key-value retrieval
• Limitations– Transactions– JOIN’s– Ad hoc queries over multiple columns
• Lacks a mature query compiler
© 2015 Ness SES. All Rights Reserved4
Step 1: Make Sure C* is the right choice
• Pro’s:– Very fast writes– Fast random reads– Very scalable
• Con’s:– No transactions– No JOIN’s– No nested queries– No aggregation framework– Terrible for range scans on primary key– Terrible for ad hoc queries– Terrible for queues
© 2015 Ness SES. All Rights Reserved5
The Conceptual Model
• Don’t think: relational table. Instead, think: nested sorted map– A row key maps to a row, the column keys in a row are sorted, can
be nested• A map gives efficient key lookup, and the sorted column keys gives
efficient range scans• Valueless columns: The column key can itself hold a value
– That’s how you sort column values• Wide rows: The number of column keys is (virtually) unbounded
© 2015 Ness SES. All Rights Reserved6
Data Modelling in C*
• Model column families around query patterns– Secondary indexes and complex SQL (JOIN, ORDER BY,
GROUP BY) will not work as they do in your favorite RDBMS– Identify the most frequent query patterns and isolate the less
frequent– Consider which queries are sensitive to latency and which are not
• But, start your design with entities and relationships– It is important to understand and start with entities / relationships,
then evolve the model around query patterns, by de-normalizing and duplicating
• Remember the model of the nested sorted map, and think how you can organize data into that map in order to satisfy your query requirements, e.g., fast look-up, ordering, grouping, filtering– When you hold a hammer, everything looks like a nail
© 2015 Ness SES. All Rights Reserved7
De-normalization
• De-normalize and duplicate for read performance• But don’t de-normalize if you don’t need to. It’s about finding the right
balance• Remember:
– Speed is important– Disk is cheap
• The best analogy for this process is the Oracle materialized view. Instead of just storing pointers to data, a materialized view makes a true copy of the data. You should expect to have multiple copies of (part of) your data
© 2015 Ness SES. All Rights Reserved8
Row and Column Keys
• Make sure row key and column key are unique– Otherwise, data could get accidentally overwritten
• There is no in-place update in C* - it is always an upsert • If you accidentally insert data with an existing row or column
key, the previous value will be silently overwritten without any error. The change will not be versioned, and the old data will be gone
– (A secret: the data is still there until the next compaction – you just can’t read it)
• Keep the column name short– Because it is stored repeatedly– Max column key (and row key) size = 64KB
• Storing values in column names is perfectly OK– Motivation: column names are stored in sort order, but column
values are not
© 2015 Ness SES. All Rights Reserved9
Wide Rows
• Leverage wide rows for ordering, grouping and filtering– When actual data is stored in column names (to sort the data), we
inevitably end up with wide rows• Benefits of wide rows:
– Ordering: Enables efficient range scans on column values, efficient search for a specific column value
– Grouping: If data is queried together, you can group that data in a single wide row that can be read efficiently as part of a single query.
• Example: for tracking time series data, group data by date/hour/machine/event in a single wide row, with each column containing granular data or roll-ups
– Filtering: Wide row column families are heavily used (with composite columns) to build custom indexes
• But not too wide, as a row is never split across nodes– All of the traffic related to one row in handled by only one node (actually
the set of nodes that hold the row’s replica)
© 2015 Ness SES. All Rights Reserved10
Row Keys and Hot Spots
• Choose the proper row key – it’s your “shard key”– Otherwise you’ll end up with hot spots, even using
RandomPartitioner• Example: storing time series data which is retrieved based on hour• Idea 1: Row key = the hour: a terrible idea
– All the writes in a given hour will go to a single node holding the row for the current hour – a hotspot
• Idea 2: Row key = the minute: no improvement– Only one node will be writing during whatever duration you pick!– As time progresses the hot spot moves, but it never goes away
• This is a recurring dilemma in C* data modelling, between taking advantage of column ordering and avoiding hotspots
© 2015 Ness SES. All Rights Reserved11
Row Keys and Hot Spots
• Idea 3: Add something else to the row key, e.g., event type, machine id– Whatever is appropriate for your use case– But what if you have nothing else to add, or you absolutely need
time period as the only row key?• Idea 4: Manually split row keys based on number of nodes in the C*
cluster– E.g., “yyyymmddhh|1”, “yyyymmddhh|2”, …– For an hour window, each node will now evenly handle the writes
in round-robin– Issue: Reading data for a given hour will require multi-get from all
the physical nodes followed by a merge in the client application– Issue: What if the number of nodes in the cluster changes?
• Proposed solution: store the cluster size over time in a C* table
© 2015 Ness SES. All Rights Reserved12
Idempotency
• Idempotency = the ability to repeatedly apply the same operation without affecting data consistency
• Usually better to keep a record of what happened when, rather than storing the current value– Prevents race conditions
• Counter-example: Status value stored as a scalar– E.g., status = ready– Sensitive to: order, replay
• Better: Store status value together with timestamp– status|20150428043206 = ready– status|20150428043207 = pending– Better yet: concatenate with UUID
• Design the data model so that operations are idempotent– Or, make sure your use case can live with inaccuracies or, that
inaccuracies can be corrected eventually
© 2015 Ness SES. All Rights Reserved13
Column Names and Values: Examples
• Store multiple values for the same column– E.g., email addresses– Store the column value as a sub-column name
• mail|[email protected]=null• mail|[email protected] = null
• Maintain all versions of a column’s values (e.g., for idempotency)– E.g., status– Use timeuuid as a sub-column name
• status|20150428043206 = ready• status|20150428043207 = pending
• Store multiple fields for an object– E.g., for a given genre, store all the books (title, author, publisher), sorted by title– Row key = <genre>, e.g., “SciFi”– Column key = <title>|author– Column key = <title>|publisher– Example: “Fiction”: “Gone With the Wind”|author = “Margaret Mitchell”, “Gone
With the Wind”|publisher= “Pocket Books”
© 2015 Ness SES. All Rights Reserved14
Example 1: Users “Like” Items
• Support queries:– Get user by userId– Get item by itemID– Get all the items(id,title) that a particular
user likes– Get all the users(id,name) who like a
particular item
© 2015 Ness SES. All Rights Reserved15
Option 1: Exact Replica of Relational Model
• Identical to relational model• No easy way to query all the items that a particular user likes, or
all the users who like a particular item– Inefficient due to lack of JOIN’s– Requires a full table scan of User_Item_Like, followed by(for
each User_Item_Like row), reading one User row and one Item row
© 2015 Ness SES. All Rights Reserved16
Option 2: Normalized entities with custom indexes
• We can easily query all the items that a particular user likes, using Item_By_User, and all the users who like a particular item, using User_By_Item
• But, we always want to get the item title in addition to the item id when we query items liked by a particular user.– In this model, we first need to query Item_By_User to get all
the item ids that a given user likes, and then, for each item id, we need to query Item to get the title
© 2015 Ness SES. All Rights Reserved17
Option 3: Normalized entities with de-normalization into custom indexes
• Title and username are de-normalized in User_By_Item and Item_By_User respectively
• This allows us to efficiently query (by reading a single row) all the item titles liked by a given user, and all the user names who like a given item
• Plus one read to get full information about the User/Item
© 2015 Ness SES. All Rights Reserved18
Option 4: partially de-normalized entities
• More efficient: one read instead o two to get full information about a User/Item
• But, much more repeated data: much wider rows, much harder to maintain
• Probably a step too far in de-normalization – a lot more data just to save one read per query
• Preference: Option 3
© 2015 Ness SES. All Rights Reserved19
Example 2: Book Rating Site
• Users can add books (bookid, author, title, rank (0.0 – 10.0)), comments (id, user, text, time), and tag them
• The application needs to support the following user operations:– Add books– Add comments for books– Add tags for books– List books (bookid, title, author) sorted by rank– List all books (bookid, title, author) tagged with a given tag– List the comments (text, user) for a given bookid, sorted by date
© 2015 Ness SES. All Rights Reserved20
Strategy: Books
• Store each book in a distinct row in the Books table– So we read all info about a book in one shot– Row key = bookid– Columns (key, value) for: author, title, rank
© 2015 Ness SES. All Rights Reserved21
Strategy: Comments
• To retrieve all comments for a book: Store comments as columns in the book’s row, sorted by timestamp– In Books table– Column key = “comment|<timeuuid>”
• Better: “cmt|<timeuuid>”– Column value = the comment text
• Alternative: a separate Comments table, where row key =– Comment text: but, unlike tags, no one searches or sorts based
on comment text– bookid: might as well be part of the Books table
© 2015 Ness SES. All Rights Reserved22
Strategy: Tags
• To retrieve all tags for a book: Store tags as columns in a book’s row, sorted by timestamp– Column key = “tag|<timeuuid>”– Column value = the tag– (Alternative: a separate Tags table, where row key = book ID –
might as well be part of the Books table)• To retrieve all books for a given tag: A separate TagBooks table, with
one row for each tag– Row key = tag– Column key = “<bookID>|title”, column value = the title– Column key = “<bookID>|author”, column value = the author– (Alternative: just store book ID in the column, use it to look up the
rest of the book info in Books – inefficient)
© 2015 Ness SES. All Rights Reserved26
… And then along came CQL
• Cassandra Query Language: An SQL–like language to query Cassandra– E.g., CREATE / DROP / ALTER TABLE / SELECT / INSERT
• Motivation: C* code is hard to write, easy to fall into performance traps– CQL is a best-practices Cassandra interface and hides the
messy details– Limited predicates - Attempts to prevent bad queries
• But, you can still get into trouble!• Motivation: C* makes it very hard to understand the data model
without reading code– CQL is a reintroduction of schema so that you don’t have
to read code to understand the data model– CQL creates a common language so that details of the data
model can be easily communicated
© 2015 Ness SES. All Rights Reserved27
Composite Primary Key
• The Primary Key– The key uniquely identifies a row.– A composite primary key consists of:
• A partition key• One or more clustering columns
• E.g. PRIMARY KEY (partition key, cluster columns, ...)• The partition key determines on which node the partition resides• Data is ordered in cluster column order within the partition
© 2015 Ness SES. All Rights Reserved28
Composite Primary Key
CREATE TABLE sporty_league (team_name varchar,player_name varchar,jersey int,PRIMARY KEY (team_name, player_name)
);
© 2015 Ness SES. All Rights Reserved29
Composite Partition Key
CREATE TABLE cities (city_name varchar,state varcharPRIMARY KEY ((city_name,state))
);
• Each city gets it own partition
© 2015 Ness SES. All Rights Reserved30
CQL/Cassandra Mapping: Simple Keys
© 2015 Ness SES. All Rights Reserved31
CQL/Cassandra Mapping: Compound Keys
© 2015 Ness SES. All Rights Reserved32
CQL/Cassandra Mapping: Maps
© 2015 Ness SES. All Rights Reserved33
The Bottom Line
• C* has a surprisingly small bag of tricks…• … that can be used to solve a surprisingly large set of problems• ….(so long as you can think sideways)
• Don’t let the CQL façade fool you
© 2015 Ness SES. All Rights Reserved34
More examples• Real time analytics:
http://blog.markedup.com/2013/03/Cassandra-real-time-analytics/• Financial time series:
http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013• Music service: http://
docs.datastax.com/en/cql/3.0/cql/ddl/ddl_music_service_c.html• Following-followers:
https://blog.safaribooksonline.com/2012/12/11/modeling-data-in-cassandra/
• Tweets: http://www.slideshare.net/jericevans/cassandra-by-example-data-modelling-with-cql3
• http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
• http://clojurecassandra.info/articles/data_modelling.html• http://www.slideshare.net/planetcassandra/data-modeling-with-travis-
price• http://htraining.s3.amazonaws.com/cassandra-training.pptx
Notes--------------------------------------------------------------
© 2014, Confidential and Proprietary Information, All Rights Reserved. www.ness.com Page. 35
תודהधन्यवाद
dankeďakujem
Thank you
mulțumesc