We care about Availability Scalability Operational Ease
Performance (Bonus) Multi-region
Try Cassandra! So we decided to
Cassandra [database XYZ]
Albert Einstein But if you judge a sh by its ability to climb a
tree, it will live its whole life believing that it is stupid.
Time to deploy Cassandra! sudo apt-get install dse-full
A good deployment Machine-level Cluster-level
Picking a machine Disk IOPS IOPS IOPS Latency Author:
D-Kuru/Wikimedia Commons Licence: CC-BY-SA-3.0-AT
Picking a machine CPU Author: Mark Sze Licence: CC BY-NC-ND
2.0
Picking a machine Memory Save some for page cache! Author:
brutalSoCal Licence: CC BY-NC-ND 2.0
On AWS Ephemeral disks. Please dont use EBS. Really. IOPS
usually the problem Instance sizes: spinning disk: m1.large,
m1.xlarge, m2.4xlarge ssd: m3.xlarge, c3.2xlarge, i2.*
Set up the machine Lots of documentation / talks about this
Recommended reading: Datastax guide [1] [1]
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
Cluster conguration A C B
Priam care and feeding of Cassandra on AWS
https://github.com/Netix/Priam
Cluster Topology We use RF=3 Ring balanced within datacenter
Nodes alternate racks (or AZs)
Cluster Topology (Priam) Token assignments stored in a database
Can takeover token in instance of node failure
Cluster Topology (Priam) Priam assigns tokens evenly per region
Alternates AZs within region az1 az3 az2 az1 az2 az3
Autoscaling groups Recover from lost instance We don't use it
for scaling with trafc
Important: Need one ASG per AZ east-1a east-1a east-1a east-1b
east-1beast-1b east-1ceast-1c east-1c ASG size: 9
Important: Need one ASG per AZ ASG size: 9 east-1a east-1a
east-1a east-1b east-1beast-1b east-1ceast-1c east-1b
Important: Need one ASG per AZ ASG-1a size: 3 east-1a east-1a
east-1a east-1b east-1beast-1b east-1ceast-1c ASG-1b size: 3 ASG-1c
size: 3 east-1c
Backups Data on ephemeral disks Guard against application
errors SSTables immutable -> ship to S3 Priam does this
Restore Have to be able use your backup Also useful for QA /
test Priam handles this rather nicely
Deployed! Time to chill?
https://www.ickr.com/photos/spunkinator/2394514059 Creative
Commons
Monitoring working / not working doesnt count.
We have our own custom reporter agent for Datadog Theres
pluggable reporter support in 2.0.2 now.
JVM GC woes
JVM GC woes All happy now
SSTables Read Histogram
Questions? before we carry on
Transition takes time mindset shift expertise (some) risk
Our experience Pick one feature rst Mindset shift Data modeling
consulting Libraries / Patterns / Data-as-a-service
Pick one feature Dont go all in with Cassandra with something
important right away Work closely with that team
You probably will make mistakes Oops!
Mindset shift Everyone knows SQL Not everyone knows Cassandra /
NoSQL Need to know queries beforehand
Enrollment Example Learners enroll into a course learner
(many-to-many) course Need to keep track of this membership
MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT
NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id`
INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l`
(`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY
(`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) )
MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT
NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id`
INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l`
(`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY
(`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) )
MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT
NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id`
INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l`
(`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY
(`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) )
MySQL Model CREATE TABLE `courses_learners` ( `id` INT(11) NOT
NULL auto_increment, `course_id` INT(11) NOT NULL, `learner_id`
INT(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c_l`
(`learner_id`, `course_id`), CONSTRAINT `ref1` FOREIGN KEY
(`course_id`) CONSTRAINT `ref2` FOREIGN KEY (`learner_id`) )
Data modeling consulting Build core team procient at C* data
modeling Available to consult for trickier use cases
Libraries / Patterns Abstract away simple (but common)
use-cases Key-value storage Simple time series Maybe every
developer wont need deep C* knowledge? More radical: data as a
service (e.g. STAASH) STAASH: https://github.com/Netix/staash
Its a long road but well get there Author: Carissa Rogers
License: CC BY 2.0
Conclusion Know Cassandra Know what makes a good deployment
Know that new skills have to be acquired