Cassandra as Memcache

Recommendation of a Strategy

Cassandra as Memcache

Edward CaprioloMedia6Degrees.com

What we learned
in Operating Systems

CPU (and registers) - Super FAST!

Main Memory - Fast

Hard Disks - Slow

What has changed since my first computer

100 MHZ

8 MB RAM

1 GB Disk

14.4kbps Modem

686 Windowz 3.11

Packard Bell

Multiple Cores @ 4GHZ

2GB RAM

2TB Disk

1/10Gb Ethernet

64 bit FC 14

Sadly no more Packard bell

The Present Situation

Computers are not and never will be fast or big enough

Until they take over and then they will be too fast and too big

Traditional two tier
Web Application

User facing tier Usually Apache|Tomcat|...

Speaks some CGI alternative php|jsp|cfm|...

Logging

Display

Back endUsually an RDBMS

Stores and indexes data

Supports a data abstraction and manipulation language

Simple Schema

create table user ( id int auto_increment,
name varchar UNIQUE, pass varchar )

create table book (
id int auto_increment,
name varchar 25 unique,
author varchar 25 )

Create table users_books (
uid int ,
bid int ,
unique (uid,bid),
index (bid))

Some Queries you might see
(user login)

Select id,pass from users where user.name=?

Totally random queries based on user login

Not often read - may not be helpful to cache

Some queries you might see
(Books a user has read)

Select user.name, book.name FROM user JOIN users_books ON user.id=users_books.uid JOIN book ON book.id=bid WHERE user.id=?

More complex query

Two join conditions

Result might be on users start page

Result might be often used by algorithms

Some queries you might see
(count all the read books)

Select user_books.bid, book.name, count(*) from user_books inner join books on user_books.bid=book.id group by user_books.bid, book.name

No where clause!

Possible table scan

Possible intermediate results to temp file

Result displayed on main index page

How fast are these queries?

Trick question!

How much data?The Log-O for 'small' data sets is negligible

How fast are the disks?Streaming much faster then seeking*

How many QPS?More requests means more contention

How much RAM?Unallocated RAM works as page cache...

Wait..Page Cache... what?

Virtual File System or VFS cache

RAM not in use by a process

Used to Cache Disk

Blocks read often get cached in RAM

large disk to RAM ratio reduces hit chance

Scaling RDBMS challenges

Scaling upMore RAM, DISK

Upper limit

Adding SlavesAdd read capacity

Does not add write capacity

Monitoring/fixing replication

Shard-edPossibly giving up DB features

Re-shard with growth

Enter Memcache

Key value store with no persistence*

Works with memory slabs

Set a key, value, and a Time To Live

Typically client controlled sharing

Normal Use CaseCheck cache

If found in cache return

Else query and save in cache

Save resource by not re-querying mostly static, non transactional, and non time sensitive data

Memcache...Good Things

More control of cache then VFS cache

Saves web server memory vs HttpSession

Fast to store and access data

Simple to use

Clients for many languages

Memcache (possibly not so good things)

Memcache empty on shutdown

8GB hash table better then 8GB more in your database machine?

Another tier to manage

Is it scalable?...

A highly un-suggested
deployment

Enter Cassandra...

Data sharding and replication

WritingStructured log format

Linear Writes to sorted memtable

Memtables flush (time,size,ops)

ReadingVFS Cache

Bloom filters

Row Cache

Key Cache

0.7.X brings TTL fields!

So then... Cassandra
is faster then memcache?

No!Memcache is an in memory datastore

Cassandra has to persist data

But may be faster, more efficient, and easier to manage then separate memcache + database tier

Configuration 1:
Defacto Standard

5 Nodes

Replication Factor = 3

Key Cache

Results in:Good Performance

Strong consistency

Highly fault tolerant

Configuration 2:
Do not care about stale reads

5 nodes


Row cache

Read Repair Chance = 0 %

Results in:1/3rd the read traffic

Minor possibility of not found/out of sync data (not much different then memcache)

Configuration 3:
Snitches get stitches

5 nodes


Row Cache

Read Repair Chance = 0%

Dynamic Snitches + Pinning

Results in:Reads should hit the same node not random replica

Caches on each node have less duplication

Configuration 4:
Little Data, Big Request load!

20 nodes

Replication Factor 20! (only this keyspace)

Row Cache

Read Repair Chance = 0%

Results in:20 nodes capable of serving this reads!

Writes do not scale (like master-slave replication)

To recap...
Cassandra

0.7.X brings Time To Live

0.7.X brings Read Repair Chance

Can serve purely from memory

Can serve from disk

Replication Factor, Caching, Sharding many ways to tune

General Awesomeness

Technology

Cassandra as Memcache