27
Coursera’s Adoption of Cassandra

Coursera's Adoption of Cassandra

Embed Size (px)

Citation preview

Page 1: Coursera's Adoption of Cassandra

Coursera’s Adoption of Cassandra

Page 2: Coursera's Adoption of Cassandra

Biography

Daniel Chia @DanielJHChia

Software Engineer, Infrastructure Team

2© 2015. All Rights Reserved.

Page 3: Coursera's Adoption of Cassandra

© 2015. All Rights Reserved.

1 Introduction

2 Want We Want From Our Database

3 MySQL Limitations

4 Cassandra - What and Why

5 Looking Back

Page 4: Coursera's Adoption of Cassandra

Coursera

4© 2015. All Rights Reserved.

Page 5: Coursera's Adoption of Cassandra

5© 2015. All Rights Reserved.

Page 6: Coursera's Adoption of Cassandra

6© 2015. All Rights Reserved.

Web iOS Android

Page 7: Coursera's Adoption of Cassandra

Database Wants

7© 2015. All Rights Reserved.

Page 8: Coursera's Adoption of Cassandra

Consistently Fast Latencies

8© 2015. All Rights Reserved.

Page 9: Coursera's Adoption of Cassandra

Availability

9© 2015. All Rights Reserved.

Page 10: Coursera's Adoption of Cassandra

Scalability

10© 2015. All Rights Reserved.

Page 11: Coursera's Adoption of Cassandra

Other Niceties

• Operational ease • Multi-region capability

11© 2015. All Rights Reserved.

Page 12: Coursera's Adoption of Cassandra

Coursera Tech Stack

• 100% AWS • MySQL + Cassandra • Service-oriented

12© 2015. All Rights Reserved.

Page 13: Coursera's Adoption of Cassandra

RDS Challenges

• Normalized data model ⇒ Unpredictable query performance

• Scaling by sharding not ideal

• Single master limitation

13© 2015. All Rights Reserved.

Page 14: Coursera's Adoption of Cassandra

C*• Columnar model • Tunable consistency • Fast • Horizontally scalable • Great community

14© 2015. All Rights Reserved.

Page 15: Coursera's Adoption of Cassandra

15© 2015. All Rights Reserved.

Looking Back

Page 16: Coursera's Adoption of Cassandra

Cassandra - Initial Pain Points

• Can’t execute arbitrary queries • Filtering, sorting, etc.

• Can’t be abused as an OLAP database

• Worries about ‘eventual’ consistency

16© 2015. All Rights Reserved.

Page 17: Coursera's Adoption of Cassandra

SQL ⇒ NoSQL Mindset Shift

• Build in-house Cassandra expertise

• Data modeling still important

• Know your queries

17© 2015. All Rights Reserved.

Page 18: Coursera's Adoption of Cassandra

Cassandra ≠ [database XYZ]

“But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”

-Albert Einstein

18© 2015. All Rights Reserved.

Page 19: Coursera's Adoption of Cassandra

Enrollment Example

• Learners enroll into a course • learner (many-to-many) course

• Need to track this membership

19© 2015. All Rights Reserved.

Page 20: Coursera's Adoption of Cassandra

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

20© 2015. All Rights Reserved.

Page 21: Coursera's Adoption of Cassandra

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

21© 2015. All Rights Reserved.

Page 22: Coursera's Adoption of Cassandra

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

22© 2015. All Rights Reserved.

Page 23: Coursera's Adoption of Cassandra

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

23© 2015. All Rights Reserved.

Page 24: Coursera's Adoption of Cassandra

Cassandra

CREATE TABLE courses_by_learner (

learner_id uuid,

course_id uuid,

PRIMARY KEY (learner_id, course_id)

)

24© 2015. All Rights Reserved.

Page 25: Coursera's Adoption of Cassandra

Helpful Things

• Data modeling consulting

• Monitoring

• Data access layer for common use cases

25© 2015. All Rights Reserved.

Page 26: Coursera's Adoption of Cassandra

Gotchas

• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)

• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance

26© 2015. All Rights Reserved.

Page 27: Coursera's Adoption of Cassandra

Thank you