Cassandra meetup slides - Oct 15 Santa Monica Coloft

SHIFT.comMigrating from MongoDB to Cassandra

by: Blake Eggleston & Jon Haddad

What is SHIFT.com?

Shift is a platform that enables marketers to communicate across organizations and departments in one single place.

It’s also an open application platform with a set of applications built on top of it that can communicate with one another.

Initial Stack

● Python○ Flask○ Celery

● MongoDB○ mongoengine

● Neo4j / Titan○ Bulbs○ thunderdome

● Redis● AWS

○ m1.xlarge for mongo

Current Stack

● Python○ still flask○ still celery○ gevent (it rocks)

● Cassandra○ 1.2.6○ cqlengine

● ElasticSearch● Redis

○ jondis● AWS

○ m1.xlarge

Why did we move to Cassandra?

● Operational Benefits○ Adding and removing nodes is much easier,

compared to Mongo’s shards● Control over our Data on Disk (LSMT)● Love CQL3● Long term scalability

○ Scales Linearly○ Multi DC Support Baked in

Migration Goals

● Zero downtime○ We wanted to roll out Cassandra without any

service interruptions● No loss of performance

○ By carefully structuring our schema we were able to match MongoDB’s performance.

Migration Strategy

Benefits of CQL3

● Easy to understand if you’re coming from RDBMS

● Collections○ sets, lists, maps

● Batch Queries● Clustering Keys

○ Handles ordering of logical rows○ Saved us from column name management scheme

and allowed us to focus on our data

Physical vs Logical Row

Single Row

Clustered Row

Data Modelling Patterns

● considerations: working with Mongo’s dbrefs and optimizing layout on disk

● structured tables as materialized views of the queries we planned on using

● moving multiple documents into a single physical row

● creating supporting index tables for looking up logical rows

Time Series: Message Stream

● Users have tens of thousands of messages● Each users message stream is specific to

them, like a twitter feed● This is Cassandra’s strength - Time Series● Considered Redis - but poor for multi-dc

create table news_feed (

user_id uuid,

message_id timeuuid,

message,

primary key (user_id, message_id));

cqlengine

● cqlengine.org● the Python CQL3 object-row mapper● exposes CQL3 tables as Python classes● maps columns to properties● builds CQL queries

#model definitionclass ExampleModel(Model): example_id = columns.UUID(primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False)

# example queryExampleModel.objects(example_type=1)

Improvements from moving to C*

● Operationally we’ve had zero problems● Outstanding Performance● Easy to build new features● Community has been amazing (mailing list

and #cassandra)

misc tips

● leveled compaction - good for read heavy workloads

● use secondary indexes sparingly, understand how they work and when to use them

● to reiterate, think about how you’re going to query your data

● use elastic search / solr for ad hoc queries

Contact Info

Jon Haddad@[email protected]

Blake Eggleston@[email protected]

….we’re hiring!

mailto:[email protected]




Technology

Cassandra meetup slides - Oct 15 Santa Monica Coloft