20
An Introduction to Cassandra

An Introduction to Cassandra - Oracle User Group

Embed Size (px)

Citation preview

Page 1: An Introduction to Cassandra - Oracle User Group

An Introduction to Cassandra

Page 2: An Introduction to Cassandra - Oracle User Group

• 18 Years of Data infrastructure management consulting• 200+ Top brands• 6000+ databases under management• Over 400 DBA’s, in 35 countries• Top 5% of DBA work force, 9 Oracle ACE’s, 2 Microsoft MVP’s, 1 Cassandra MVP• Oracle, Microsoft, MySQL,• Datastax partners, Netezza,• Hadoop and MongoDB plus• UNIX Sysadmin and Oracle apps

2© 2016. All Rights Reserved.

About Pythian

Page 3: An Introduction to Cassandra - Oracle User Group

•Cassandra Consultant •First contact was 0.8 •Cassandra MVP & Datastax Certified Architect •Lisbon Cassandra Meetup •Passion for distributed systems •Loves a good challenge •Waterpolo is my sport

•@cjrolo3

About me

Page 4: An Introduction to Cassandra - Oracle User Group

4

Cassandra 101• Cassandra is a highly scalable distributed masterless noSQL

database• Column Store Architecture• Log Structured Data • CAP Theorem• Eventually Consistent

Page 5: An Introduction to Cassandra - Oracle User Group

Introduction• Cassandra is a highly scalable distributed masterless noSQL database• All nodes are the same, highly resilient

Page 6: An Introduction to Cassandra - Oracle User Group

CAP Theorem• The CAP theorem states that you have to pick two of Consistency,

Availability, Partition tolerance: You can't have the three at the same time and get an acceptable latency…

• … at any given moment. • Cassandra values Availability and Partitioning tolerance (AP).

Tradeoffs between consistency and latency are tunable in Cassandra (Per request!).

• Requests offer a tunable level of consistency, all the way from "writes never fail" to "block for all replicas to be readable".

Page 7: An Introduction to Cassandra - Oracle User Group

Consistent Hashing• A hash consists of one or more arithmetic operations on a piece of

data – e.g. MD5, Murmur3• We hash keys in an attempt to spread key hashes in a uniform

manner for any given set of keys.• A consistent hash is one where the hash range is divided up into

ranges called a map.• Once the map is defined a given key will always end up in the same

map range.

Page 8: An Introduction to Cassandra - Oracle User Group

Log Structured Data• Instead of rewriting records in place or storing records near each other

based on key (clustering), just simply write new records, updates to records or deletes at the end of the file that holds the table.

• Add an index so you can read the table randomly without loading the whole thing into memory

• UPSERT people (1,”Jonathan”,”Ellis”);• UPSERT people (2,”Billy”,”Bosworth”);• UPSERT people (2,”William”,”Bosworth”);• DELETE people (1);

Page 9: An Introduction to Cassandra - Oracle User Group

CRUD, ACID and Cassandra• C* doesn’t really have CRUD. Update is a special case of Create, and

Delete is not a real Delete.• C* is not ACID. C* doesn’t support transactions.• C* is BASE: Basically Available Soft-state Eventual consistency.

• Different versions may live in the cluster at the same time. Eventually all the nodes will see the newest data.

Page 10: An Introduction to Cassandra - Oracle User Group

Write Path

Page 11: An Introduction to Cassandra - Oracle User Group

Read Path

Page 12: An Introduction to Cassandra - Oracle User Group

Compaction and Repair• Compaction is a process that Cassandra uses to keep local data in

check.• Tables are append only, so obsolete data will live with current data• Compaction “cleans the house”

• Repair is a process that Cassandra uses to keep Cluster data in check:• Nodes can get out-of-sync (Hardware failure, network issues, etc…)• Repair makes sure every node have the latest data

Page 13: An Introduction to Cassandra - Oracle User Group

CQL - Cassandra Query Language• CQL is not SQL• Very similar:• cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' :

'NetworkTopologyStrategy', DC1 : 1};• cqlsh> USE sandbox;• cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));• cqlsh:sandbox> INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-

68c34e83db3a, 'testing');• cqlsh:sandbox> SELECT * FROM data;• Abstracts from the user from the internal structure (Can be dangerous!)• Provides several benefits over older model (Thrift)

Page 14: An Introduction to Cassandra - Oracle User Group

14

Cassandra vs Oracle• Scale up vs Scale out• High availability vs Continuous availability• Highly structured vs Semi-structured• Replication is easy

Page 15: An Introduction to Cassandra - Oracle User Group

15

Data Distribution

Page 16: An Introduction to Cassandra - Oracle User Group

16© 2015. All Rights Reserved.

Cassandra is not RDBMS• You need to know your reads before you write• Replication Factor affects your availability, choose wisely!• Per operation consistency

– All the way from "writes never fail" to "block for all replicas to be readable".

• Update is a special case of Create, and Delete is not a real Delete.

Page 17: An Introduction to Cassandra - Oracle User Group

17©

The good, the bad and the ugly• The Good:

– Distributed– No single point of failure– It's easy to use– You get to sleep at night!

• The Bad:– SQL != CQL, can't just drop in

• The Ugly– Not enough users!

Page 18: An Introduction to Cassandra - Oracle User Group

18

Opportunities• New applications

– loose data model avoids app re-writing, data migration, etc...

• Augmentation• Scale out better • Have a data model that allows Cassandra to absorb high

velocity data• Absorb traffic from several locations

Page 19: An Introduction to Cassandra - Oracle User Group

19

Pitfalls!• Hardware is important

– Get SSDs...

• Cassandra "just works"– Can lead to overlooking how the system is performing

• Cassandra is new and is changing fast!

Page 20: An Introduction to Cassandra - Oracle User Group

20© 2015. All Rights Reserved.

Q&A• Thanks for listening!