If you can't read please download the document
Upload
edward-capriolo
View
17.394
Download
1
Embed Size (px)
Citation preview
Recommendation of a Strategy
Cassandra as Memcache
Edward CaprioloMedia6Degrees.com
What we learned
in Operating Systems
CPU (and registers) - Super FAST!
Main Memory - Fast
Hard Disks - Slow
What has changed since my first computer
100 MHZ
8 MB RAM
1 GB Disk
14.4kbps Modem
686 Windowz 3.11
Packard Bell
Multiple Cores @ 4GHZ
2GB RAM
2TB Disk
1/10Gb Ethernet
64 bit FC 14
Sadly no more Packard bell
The Present Situation
Computers are not and never will be fast or big enough
Until they take over and then they will be too fast and too big
Traditional two tier
Web Application
User facing tier Usually Apache|Tomcat|...
Speaks some CGI alternative php|jsp|cfm|...
Logging
Display
Back endUsually an RDBMS
Stores and indexes data
Supports a data abstraction and manipulation language
Simple Schema
create table user ( id int auto_increment,
name varchar UNIQUE, pass varchar )
create table book (
id int auto_increment,
name varchar 25 unique,
author varchar 25 )
Create table users_books (
uid int ,
bid int ,
unique (uid,bid),
index (bid))
Some Queries you might see
(user login)
Select id,pass from users where user.name=?
Totally random queries based on user login
Not often read - may not be helpful to cache
Some queries you might see
(Books a user has read)
Select user.name, book.name FROM user JOIN users_books ON user.id=users_books.uid JOIN book ON book.id=bid WHERE user.id=?
More complex query
Two join conditions
Result might be on users start page
Result might be often used by algorithms
Some queries you might see
(count all the read books)
Select user_books.bid, book.name, count(*) from user_books inner join books on user_books.bid=book.id group by user_books.bid, book.name
No where clause!
Possible table scan
Possible intermediate results to temp file
Result displayed on main index page
How fast are these queries?
Trick question!
How much data?The Log-O for 'small' data sets is negligible
How fast are the disks?Streaming much faster then seeking*
How many QPS?More requests means more contention
How much RAM?Unallocated RAM works as page cache...
Wait..Page Cache... what?
Virtual File System or VFS cache
RAM not in use by a process
Used to Cache Disk
Blocks read often get cached in RAM
large disk to RAM ratio reduces hit chance
Scaling RDBMS challenges
Scaling upMore RAM, DISK
Upper limit
Adding SlavesAdd read capacity
Does not add write capacity
Monitoring/fixing replication
Shard-edPossibly giving up DB features
Re-shard with growth
Enter Memcache
Key value store with no persistence*
Works with memory slabs
Set a key, value, and a Time To Live
Typically client controlled sharing
Normal Use CaseCheck cache
If found in cache return
Else query and save in cache
Save resource by not re-querying mostly static, non transactional, and non time sensitive data
Memcache...Good Things
More control of cache then VFS cache
Saves web server memory vs HttpSession
Fast to store and access data
Simple to use
Clients for many languages
Memcache (possibly not so good things)
Memcache empty on shutdown
8GB hash table better then 8GB more in your database machine?
Another tier to manage
Is it scalable?...
A highly un-suggested
deployment
Enter Cassandra...
Data sharding and replication
WritingStructured log format
Linear Writes to sorted memtable
Memtables flush (time,size,ops)
ReadingVFS Cache
Bloom filters
Row Cache
Key Cache
0.7.X brings TTL fields!
So then... Cassandra
is faster then memcache?
No!Memcache is an in memory datastore
Cassandra has to persist data
But may be faster, more efficient, and easier to manage then separate memcache + database tier
Configuration 1:
Defacto Standard
5 Nodes
Replication Factor = 3
Key Cache
Results in:Good Performance
Strong consistency
Highly fault tolerant
Configuration 2:
Do not care about stale reads
5 nodes
Replication Factor = 3
Row cache
Read Repair Chance = 0 %
Results in:1/3rd the read traffic
Minor possibility of not found/out of sync data (not much different then memcache)
Configuration 3:
Snitches get stitches
5 nodes
Replication Factor = 3
Row Cache
Read Repair Chance = 0%
Dynamic Snitches + Pinning
Results in:Reads should hit the same node not random replica
Caches on each node have less duplication
Configuration 4:
Little Data, Big Request load!
20 nodes
Replication Factor 20! (only this keyspace)
Row Cache
Read Repair Chance = 0%
Results in:20 nodes capable of serving this reads!
Writes do not scale (like master-slave replication)
To recap...
Cassandra
0.7.X brings Time To Live
0.7.X brings Read Repair Chance
Can serve purely from memory
Can serve from disk
Replication Factor, Caching, Sharding many ways to tune
General Awesomeness