Upload
datastax-academy
View
662
Download
3
Embed Size (px)
Citation preview
Coursera’s Adoption of Cassandra
Biography
Daniel Chia @DanielJHChia
Software Engineer, Infrastructure Team
2© 2015. All Rights Reserved.
© 2015. All Rights Reserved.
1 Introduction
2 Want We Want From Our Database
3 MySQL Limitations
4 Cassandra - What and Why
5 Looking Back
Coursera
4© 2015. All Rights Reserved.
5© 2015. All Rights Reserved.
6© 2015. All Rights Reserved.
Web iOS Android
Database Wants
7© 2015. All Rights Reserved.
Consistently Fast Latencies
8© 2015. All Rights Reserved.
Availability
9© 2015. All Rights Reserved.
Scalability
10© 2015. All Rights Reserved.
Other Niceties
• Operational ease • Multi-region capability
11© 2015. All Rights Reserved.
Coursera Tech Stack
• 100% AWS • MySQL + Cassandra • Service-oriented
12© 2015. All Rights Reserved.
RDS Challenges
• Normalized data model ⇒ Unpredictable query performance
• Scaling by sharding not ideal
• Single master limitation
13© 2015. All Rights Reserved.
C*• Columnar model • Tunable consistency • Fast • Horizontally scalable • Great community
14© 2015. All Rights Reserved.
15© 2015. All Rights Reserved.
Looking Back
Cassandra - Initial Pain Points
• Can’t execute arbitrary queries • Filtering, sorting, etc.
• Can’t be abused as an OLAP database
• Worries about ‘eventual’ consistency
16© 2015. All Rights Reserved.
SQL ⇒ NoSQL Mindset Shift
• Build in-house Cassandra expertise
• Data modeling still important
• Know your queries
17© 2015. All Rights Reserved.
Cassandra ≠ [database XYZ]
“But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”
-Albert Einstein
18© 2015. All Rights Reserved.
Enrollment Example
• Learners enroll into a course • learner (many-to-many) course
• Need to track this membership
19© 2015. All Rights Reserved.
MySQL
CREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
20© 2015. All Rights Reserved.
MySQL
CREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
21© 2015. All Rights Reserved.
MySQL
CREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
22© 2015. All Rights Reserved.
MySQL
CREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
23© 2015. All Rights Reserved.
Cassandra
CREATE TABLE courses_by_learner (
learner_id uuid,
course_id uuid,
PRIMARY KEY (learner_id, course_id)
)
24© 2015. All Rights Reserved.
Helpful Things
• Data modeling consulting
• Monitoring
• Data access layer for common use cases
25© 2015. All Rights Reserved.
Gotchas
• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)
• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance
26© 2015. All Rights Reserved.
Thank you