Practical Cassandra

Practical Cassandra

Vitalii [email protected]

@tivv00

NoSQL key-value vs RDBMS – why and when

Cassandra architecture

Cassandra data model

Life without joins or HDD space is cheap today

Hardware requirements & deployment hints

mailto:[email protected]

RDBMS problems

Sometimes you reach the point where single server can't cope

Relational Replication

Not write scalable

Data is not instantly visible

Sharding

No foreign keys or joins

No transactions

Reduced reliability (multiple servers)

Schema update is a pain

Cassandra NoSQL

Master-Master Replication + Sharding in one bottle

Peer-to-peer architecture (no SPOF)

Easy cluster reconfiguration

Eventual consistency as a standard

All data in one record – no need to join

Flexible schema

Our data

We have intelligent Internet cache

Intelligent means we don't cache everything or we would need Google's DC

It's still hundreds of millions of sites

And 10s of TB of packed data

Randomly updated

Analysis must be able to process all of this in term of hours

Cassandra ring

- server

- client

Ring partitioner types

Order Preserving

Each server serves key range

Range queries possible

Read/Write/Disk space hot spots possible

Complex to fix key range

Random

Data is smoothly distributed on servers

No range queries

No hot spots

Fixed key range

Runtime CAP-solving

The whole thing is about replication

CAP: Consistency, Availability, Partition tolerance – choose two.

With cassandra you can choose at runtime.

Runtime CAP-solving

Quorum read/write Fast writes

Fast reads Fast, less consistency

Data model

Keyspaces – much like database in RDBMS

Column Families – storage element, like tables in RDBMS

Columns – you can have million for a row, names are flexible, still like columns in RDBMS

Super Column – A column that has structured content, superseded by composite columns

Twitter DB

Example

Users tableID, Name, Birthday

Tweets tableUserID, TweetID,

TweetContent

Twitter Keyspace

Users CFKey: User ID

Name(Str), Birthday(Str)

Timeline CFKey: User ID

<TweetID>(TweetContent)

Twitter DB

Example (alternative)

Users tableID, Name, Birthday

Tweets tableUserID, TweetID,

TweetContent

Twitter Keyspace

Data CFKey: User ID

Name(Str), Birthday(Str),<TweetID>(TweetContent)

Example (data)

Users

Data

Tweets

ID Name

1 Tom

2 John

User ID Text

1 1 Hello

1 2 See me?

2 3 See you!

Key Data

1 Name = Tom T_1 = Hello T_2 = See me?

2 Name = John T_3 = See you!

Data model

You can have same key in multiple column families

You can have different set of columns for different keys in same column family

You can query a range of columns for a key (columns are sorted) with pagination

You can have (and it's useful) to have columns without values

ACID vs BASE

Super Heroes are good, but not scalable. So, what do we loose?

No Atomicity

You've got no transactions – no rollback

The maximum you have is atomic update to single row

Failed operation MAY be applied(that's why counters are not reliable)

Eventual Consistency

Cassandra has no central governor

This means no bottleneck

This also means no one knows if database as a whole is consistent

Regular repair is your friend!

No Isolation

All mutations are timestamped to restore order from chaotic arrival

You MUST have your clock synchronized

That's how operation are applied on server :)

Controlled Durability

Cassandra uses transaction log to ensure durability on single server

Durability of the whole database depends on both total number of replicas and write operation replication factor

Remember, single server 99% uptime means 36.6% (0.99100) of “full cluster working” uptime for 100 servers – most time you've got at least one server down!

Data querying

With SQL you simply ask.

You can easily scan the whole DB

Indexes may help

Any calculation is repeated each time

This can be slow on read

Data querying

With NoSQL you can't efficiently scan the whole db

No “group by” or “order by”

You must prepare your data beforehand

You have multiple copies of data

You must recalculate on application logic change

The precalculated reads are fast

Think on your queriesin advance!

There is no “I'll simply add an index, some hints and my query will become fast”

Any index is created and maintained from application code

Now cassandra have secondary indexes, but they are much inferior to custom ones

What's wrong with secondary indexes

They work on fixed column names

They are consistent with data

This means they live near the data they index

This means they are distributed between nodes by row key, not by indexed column value

This means you need to ask every node to get single value

What's wrong with secondary indexes

Node 1

A: phone=1

B: phone=3

Phone index:

1=A,3=B

Node 2

C: phone=3

D: phone=5

Phone index:

3=C,5=D

Node 4

G: phone=3

H: phone=7

Phone index:

3=G,7=H

Node 3

E: phone=1

F: phone=5

Phone index:

1=E,5=F

“Index” example

Column family people

Key: Fred [phone=2223355, phone2=4445566, fax=9998877]

Key: John [phone=4445566, mobile=099123456]

Column family phone_directory

Key: 2223355 [Fred]

Key: 4445566 [Fred, John]

Key: 9998877 [Fred]

Key: 099123456 [John]

“Join” example

Column family customer

Key: Boeing [email: [email protected]]

Key: Oracle [skype: java]

Column family orders

Key: 1 [customer: Boeing, total: 200m]

Key: 2 [customer: Oracle, total: 300m]

Key: 3 [customer: Boeing, total: 500m]

Column family customer_order_totals

Key: Boeing[ 1:200m, 3:500m]

Key: Oracle[ 2:300m]


Peer-to-peer replication

Your operation can return OK even if it was not written to every replica

Hinted handoff will try to repair later

Even if your operation have failed, it may have been written to some replicas

This inconsistency won't be repaired automatically

This are drawbacks of “no master” architecture

You need to repair regular!

Tombstones and Repair

Delete events are recorded as Tombstones to ensure arriving“before delete” data won't be used

Regular repair not only makes sureyour data is replicated, but also that your deletes are replicated.If you don't, beware of ghosts!

Resources & Environment

Disk space requirements

Memory requirements

Native plugins & configuration

Disk estimations

Say, we've got 1TB of data

Replication factor 3 make it 3TB

Data duplication make it 12TB

Tombstones/repair space make it 24TB

Backups make it 36TB

Memory estimations

Cassandra has certain in-memory structures that are linear to data amount

Key and Row caches – configured at column family level. Change defaults if you've got a lot of CFs

Bloom filters and key samples cache are configured globally in latest versions

Estimate minimum ~0.5% of RAM for your data amount

Native specifics

Cassandra (like may other large things) likes JNA. Please install.

Cassandra maps files to memory – cassandra process virtual and resident memory size will grow because of mmap.

Default heap sizes are large – tame it if it's not only task on the host

Q&A

Author: Vitalii [email protected]

@tivv00


Technology

Practical Cassandra