Transcript
Page 1: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

CASSANDRA @

ULTRAVISUALCassandra Day New York 2014

Skye BookLead Systems Architect

Page 2: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

ULTRAVISUAL

A visual network for inspiration, expression,

and collaboration

Page 3: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The Feed• A user’s first taste of UV

• More than just posts

• Constantly being tweaked and re-thought

Page 4: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

SELECT DISTINCT _post.*FROM _postJOIN _collection_post cp ON _post.uuid=cp.post_uuidJOIN _collection_follow cf ON cp.c_uuid=cf.collection_uuidWHERE cf.user_id = ?ORDER BY _post.created_at DESCLIMIT 20 OFFSET 0

The Old Way

Started Simple !

“Show me recent posts in collections I follow”

Page 5: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

SELECT a.*FROM _user_follow a, _user_follow bWHERE b.follower=12345AND a.follower=b.followedORDER BY a.followed_at DESCLIMIT 20 OFFSET 0

The Old Way

Added Complexity !

“Show me people recently followed by my connections”

Page 6: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The Old Way

Every new feature needs another query !

Feed requests generate a disproportionate amount of load to normal CRUD ops

Page 7: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Reframing the Problem

From This:

A place for posts, new collections, social activity, and anything else interesting

nitro404.com/computers/knex.php

Page 8: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Reframing the Problem

To This:

A list of items interesting to the user

Page 9: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The New Way

Model First

• With an SQL background, this can be misleading.

• Essential Question: “How do I need to access this data?”

Page 10: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

–Rick Branson, Instagram Cassandra Summit 2013

“Try to model data as a log of user intent”

The New Way

Page 11: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

}The New Way

user status

created_at

story json2 0 61b97280 user_follow:3:5 {“foo”:”bar”}

2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”}

2 1 565752e0 collection_follow:5:d70961c1

{“foo”:”bar”}

2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”}

Primary Key Cached story JSON

Model for user feeds

• Fast to fetch user stories

• Cached JSON means almost zero SQL requests

Page 12: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Fast.Response times cut from

over 100’s ms to 30ms range

Page 13: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Launch WeekFeatured by Apple!

Cluster Disk Usage

26%

74%

Page 14: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Don’t be too cute

cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;

Page 15: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Handling Deletions• Data is only appended,

never deleted from user feeds

• Adapted Instagram’s ‘Anti-Column’ solution

• Avoids missed deletions for nodes down longer than GCGraceSeconds

• Avoids race condition where deletion arrives before write.

Sam follows Sandy

user

created_at

status

story2 4a8189e0 1 user_follow:

3:5Sam unfollows Sandy

user

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

Page 16: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Negated Entriesuser

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

user

status

created_at

story2 0 61b97280 user_follow:

3:52 1 4a8189e0 user_follow:

3:5

Keeps all entries in a single time series

First page can usually be populated by a single read

Splits user’s row into two lists, live and undo

Will always require at least two reads

Page 17: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Further Uses• User Notifications

• User Onboarding

• Reshare Statistics

• User & Content Reports

• API Statistics

Page 18: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

User Onboarding

user created_at

sequence step content2 61b97280 onboaring_v2 1 rec_collections_1

3 5daa04c0 onboaring_v2 2 rec_collections_2

5 565752e0 onboaring_v3 1 find_friends

6 4a8189e0 onboaring_v3 1 find_friends

Sequenced feed entries for users on signup

Page 19: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Drivers • Java: Started with Astyanax, moved to Datastax

v2

• Node.js: node-cassandra-cql

Page 20: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cryptic message with large batch updates in pre-release versions of 2.0 driver

DS Driver Issue 229

com.datastax.driver.core.exceptions.DriverInternalError: An unexpected protocol error occured. This is a bug in this library, please report: Unknown code 256 for a consistency level

As of 2.0, batches with more than 64k statements throw a better exception:

java.lang.IllagalStateException: Batch statement cannot contain more than 65536 statements.

Page 21: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Just use LZ4

Compression

Page 22: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cassandra-4851Unfortunate truth in Cassandra 2.0.5

!cqlsh:test> SELECT * FROM user_feed WHERE user = 2 AND created_at > :some_uuid AND status=0;!cqlsh:test> Bad Request: PRIMARY KEY part status cannot be restricted (preceding part created_at is either not restricted or by a non-EQ relation)

Page 23: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cassandra-4851

Adds CQL3 support for vector comparison syntax

!cqlsh:test> SELECT * FROM timeline WHERE day = ’21 Jun 2014’ AND (hour,min) >= (3,50) AND (hour,min,sec) <= (4,37,30);

Available in 2.0.6

Page 24: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production ExperiencesUpgrades • Manual package installs (dsc20 from Datastax)

• One node at a time

• Upgrade, wait for healthy status & operations, move on

• OpsCenter provides good overview

Page 25: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Speaking of OpsCenter… • Don’t be alarmed if nodes appear but agent

data does not

• opscenterd often needs a restart after cluster upgrade to see agents again

Page 26: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Service Discovery • Running on AWS using EC2MultiRegionSnitch

• Using OpsWorks (Amazon’s Chef service) for seed config

Page 27: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Chef Cookbookgithub.com/skyebook/cassandra-opsworks-chef-cookbook

• Forked from Michael Klishin’s awesome C* cookbook

• Added integration with OpsWorks’ stack.json# Add this node as the first seed# If using the multi-region snitch, we must use the public IP addressif node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << node["opsworks"]["instance"]["ip"]else seed_array << node["opsworks"]["instance"]["private_ip"]end!node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values| if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << values["ip"] else seed_array << values["private_ip"] endend set[:cassandra][:seeds] = seed_array

Page 28: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Questions


Recommended