Meetup core concepts-erick-ramirez-20150729

Preview:

Citation preview

Cassandra Core Conceptsand why Netflix runs Cassandra on the cloud Erick Ramirez @flightc, DataStax Engineering

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Welcome

2

• Introducing Cassandra • Why Netflix runs Cassandra on the cloud • Feel free to ask questions

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Relational data model

3

• Normalised schema, table joins, ACID • Joins are very expensive on billions of rows • Sharding tables across systems is complex • Performance preferred over “always on” • Requires massive high-end systems

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Big data requirements

4

• Distribute data across multiple nodes • Relaxed consistency • Relaxed schema • Scale, scale, scale!

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

NoSQL landscape

5

• Graph, Key-value, Document, Column family • Consistency - same result regardless of node • Availability - high read/write volumes • Partition tolerance - survive network isolation

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

CAP theorem

6

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

What is Cassandra?

7

• Massively scalable NoSQL database • Fully distributed, no single-point-of-failure • Open sourced by Facebook • Linear horizontal scaling

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Modelling Cassandra

8

• Use Cassandra Query Language (CQL) • Similar SQL-like approach

• CREATE, ALTER, DROP • SELECT, INSERT, UPDATE, DELETE

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Modelling Cassandra

9

CREATE TABLE users ( userid text, name text, email text, PRIMARY KEY (userid));

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Why Cassandra

10

• All nodes are the same - no SPOF • Real-time, durable writes • Linear scaling on commodity servers • Real-time replication across data centres • Always on - no offline operation • Because you have a scale problem

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Why not Cassandra

11

• RDBMS excels in ACID transactions • You need to justify your purchase of massive

high-end servers

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Common use cases

12

• Personalisation/recommendations (Netflix,ebay) • Messaging (Instagram) • IoT (Riptide IO) • Fraud detection (Barracuda) • Playlists and collections (Spotify) • Graph (SpotRight)

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

A Cassandra cluster

13

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Cassandra Summit 2015

14

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

academy.datastax.com

15

Erick Ramirez© 2015 DataStax, All Rights Reserved.

@flightc

Thank you

16

• Erick Ramirez • @flightc