MEMCACHED
An Overview
Quick overview and best practices
- Asif Ali / Oct / 2010 / [email protected]
Image courtesy: memcached offical website
“The Disk is the new tape!”
Quoted from somewhere
What is memcache?
• Simple in memory caching system
• Can be used as a temporary in-memory data
store
• Stores data in-memory only.
• Excellent read performance.
• Great write performance.
• Data distribution between multiple servers.
• API available for most languages
Memcache
• NO – acid, data structures, etc etc.
• If you’re looking for a replacement for your DB then you’re looking at the wrong solution.
• Simple “put” a data using a key
• Get the data using the key.
• Generally, your app remembers the key.
• Data typically expires or gets flushed out (so has a short life).
• Does not auto commit to disk
When should I use memcached?
• When your database is optimized to the hilt and you
still need more out of it.
– Lots of SELECTs are using resources that could be better used
elsewhere in the DB.
– Locking issues keep coming up
• When table listings in the query cache are torn down
so often it becomes useless
• To get maximum “scale out” of minimum hardware
Use cases for memcached
• Anything what is more expensive to fetch from elsewhere, and has sufficient hit rate, can be placed in memcached
• Web Apps that require extremely fast reads
• Temporary data storage
• Global data store for your application data
• Session data (ROR)
• Extremely fast writes of any application data (that will be committed to an actual data store later)
• Memcached is not a storage engine but a caching system
NoSQL & SQL
• We’re not going to debate about the pros and
cons of the different NoSQL vs SQL solutions.
• My View: It really depends upon your requirements.
• NoSQL caters to large but specific problems and in
my opinion, there is no all in one solution.
• Your comfort of MySQL or SQL statements might not
solve certain scale problems
For Example
• If you have hundreds of Terra Bytes of data and would like to
process it, then clearly – Map Reduce + Hive would be a great
solution for it.
• If you have large data and would like to do real time analytics
/ queries where processing speed is important then you might
want to consider Cassandra / MongoDB
• If your app needs massive amounts of simple reads then
clearly memcached is probably the solution of choice
• If you want large storage of data with non aggregate
processing and very comfortable with MySQL then a MySQL
based sharded solution can do wonders.
MEMCACHED SERVER
Memcached Server Info
• Memcache server accepts read / write TCP
connections through a standalone app or daemon.
• Clients connect on specific ports to read / write
• You can allocate a fixed memory for memcached to
use.
• Memcached stores data in the RAM (of course!)
• Memcached uses libevent and scales pretty well
(theoritical limit 200,000)
Configuration options
• Max connections
• Threads
• Port number
• Type of process – foreground or daemon
• Tcp / udp
Read write (Pseudo code)
• Set “key”, “value”, ”expire time”
• A = get “key”
Sample Ruby code
• Connection:
– With one server:
M = MemCache.new ‘localhost:11211’, :namespace
=> ‘my_namespace’
– With multiple servers:
M = MemCache.new %w[one.example.com:11211
two.example.com:11211], :namespace =>
‘my_namespace’
Sample Ruby code
• Usage
– m = MemCache.new('localhost:11211')
– m.set 'abc', 'xyz‘
– m.get 'abc‘ => ‘xyz’
– m.replace ‘abc’, ‘123’
– m.get ‘abc’ => ‘123’
– m.delete ‘abc’ => ‘DELETED’
– m.get ‘abc’ => nil
Memcached data structure
• Generally simple strings
• Simple strings are compatible with other
languages.
• You can also store other objects example
hashes.
• Complex data stored using one language
generally may not be fetchable from different
systems
Sample Ruby code
• Memcache can store data of any data types.
– m = MemCache.new('localhost:11211')
– m.set ‘my_string’, ‘Hello World !’
– m.set ‘my_integer’, 100
– m.set ‘my_array’, [1,”hello”,”2.0”]
– m.set ‘my_hash’, {‘1’=>’one’,’2’=>’two’}
– m.set ‘my_complex_data’, <any complex data>
Limits of memcached
• Keys can be no more then 250 characters
• Stored data can not exceed 1M (largest typical slab
size) per key
• There are generally no limits to the number of nodes
running memcache
• There are generally no limits the the amount of RAM
used by memcache over all nodes
– 32 bit machines do have a limit of 4GB though
FEATURES
Memcached can replicate for high
availability
Memcached 1
Data structure A
Memcached 2
Copy of
Data structure A
Data can be distributed using
memcached
Memcached 1
Part 1 - Data
structure A
Memcached 2
Part - 2
The connecting client can connect to either memcached 1 or memcached 2 to
fetch any data that is distributed across the two servers
Memcached can store any object
• Simple String
• Hash
• Other
• Strings can be read / written among
heterogenous systems.
Memcached best practices
Best Practices
• Allocate enough space for memcached.
• Memcache does not auto save data into disk so don’t
forget to serialize your data if you need to.
• A memcached crash could lead to a large downtime
if you have to load a lot of data to load (into
memcache) so you should
– Replicate memcached data OR
– Find algorithms to store data as fast as you can OR
– Have redundant servers with similar data and handle “no
data” situation well.
Best Practices
• Shared nothing, non replicated set of servers works the best.
• Don’t try to replicate your database environment into
Memcached. It is not what it was meant for.
• Don’t store data for too long in memcached. Least recently
used items (LRU) are evicted automatically in certain
scenarios.
• Reload data that is required for your reads as often as
possible
• Don’t go just by existing benchmarks; Find out your
application benchmarks in your environment and use that
variable to scale.
What to look out for
• Memcached performance can hit roadblocks
under very high reads and writes on a single
instance so try to isolate read / write instances
or see what works best for your app.
• Data increments on memcache values –
remember this is a shared memory space and
data is not always “locked” before being
updated.
OUR MEMCACHED USAGE
Usage History
• Considered using memcached more than 2
years ago due to problems in high volumen
MySQL read and writes.
• Used it first as a way to store Mongrel’s
session variables instead of MySQL.
The Problem we tried to solve
• Deliver a matching ad to a complex set of variables
• 6000+ devices in device db.
• 8-10m records in ip data
• Country / carrier / manufacturer / channel / targeting
• IP filters, black lists
• Bot Filtering
• Ad moderation etc etc etc..
• Typical In an ad network scenario
The solution needed to
• Make data fetching as fast as possible
• Make data structures simple
• Make processing extremely simple (for
example a string comparision vs a SQL Query)
• Make processing of data as fast as possible.
• Complete all processing 50 ms or less.
..continued
• Using a database was possible in low traffic
scenario.
• Traffic grew and so did database nightmares.
Our usage of memcached
• Hundreds of millions of reads daily; small
chunks of data. Infrequent writes.
• More than 60G of memcached shared data
space.
• MySQL was fine until we moved our
performance metric from seconds to ms.
• Current total transaction time between 21-
50ms.
Our partial stack
Memcached clients currently used
• http://deveiate.org/code/Ruby-MemCache/
• http://github.com/higepon/memcached-client
• There are newer and better clients that can be
used.
Questions?