Upload
jon-haddad
View
105
Download
2
Embed Size (px)
DESCRIPTION
Slides from our presentation at the Santa Monica Coloft on our Migration from MongoDB to Cassandra.
Citation preview
SHIFT.comMigrating from MongoDB to Cassandra
by: Blake Eggleston & Jon Haddad
What is SHIFT.com?
Shift is a platform that enables marketers to communicate across organizations and departments in one single place.
It’s also an open application platform with a set of applications built on top of it that can communicate with one another.
Initial Stack
● Python○ Flask○ Celery
● MongoDB○ mongoengine
● Neo4j / Titan○ Bulbs○ thunderdome
● Redis● AWS
○ m1.xlarge for mongo
Current Stack
● Python○ still flask○ still celery○ gevent (it rocks)
● Cassandra○ 1.2.6○ cqlengine
● ElasticSearch● Redis
○ jondis● AWS
○ m1.xlarge
Why did we move to Cassandra?
● Operational Benefits○ Adding and removing nodes is much easier,
compared to Mongo’s shards● Control over our Data on Disk (LSMT)● Love CQL3● Long term scalability
○ Scales Linearly○ Multi DC Support Baked in
Migration Goals
● Zero downtime○ We wanted to roll out Cassandra without any
service interruptions● No loss of performance
○ By carefully structuring our schema we were able to match MongoDB’s performance.
Migration Strategy
Benefits of CQL3
● Easy to understand if you’re coming from RDBMS
● Collections○ sets, lists, maps
● Batch Queries● Clustering Keys
○ Handles ordering of logical rows○ Saved us from column name management scheme
and allowed us to focus on our data
Physical vs Logical Row
Single Row
Clustered Row
Data Modelling Patterns
● considerations: working with Mongo’s dbrefs and optimizing layout on disk
● structured tables as materialized views of the queries we planned on using
● moving multiple documents into a single physical row
● creating supporting index tables for looking up logical rows
Time Series: Message Stream
● Users have tens of thousands of messages● Each users message stream is specific to
them, like a twitter feed● This is Cassandra’s strength - Time Series● Considered Redis - but poor for multi-dc
create table news_feed (
user_id uuid,
message_id timeuuid,
message,
primary key (user_id, message_id));
cqlengine
● cqlengine.org● the Python CQL3 object-row mapper● exposes CQL3 tables as Python classes● maps columns to properties● builds CQL queries
#model definitionclass ExampleModel(Model): example_id = columns.UUID(primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False)
# example queryExampleModel.objects(example_type=1)
Improvements from moving to C*
● Operationally we’ve had zero problems● Outstanding Performance● Easy to build new features● Community has been amazing (mailing list
and #cassandra)
misc tips
● leveled compaction - good for read heavy workloads
● use secondary indexes sparingly, understand how they work and when to use them
● to reiterate, think about how you’re going to query your data
● use elastic search / solr for ad hoc queries