Upload
capotej
View
7.768
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Posterous recently deployed Riak to serve as their content cache. In this talk, Julio Capote will cover why the engineering team chose Riak for the use case. He'll also share some details on the old post cache and its problems, what solutions they evaluated, and how they settled on Riak.
Citation preview
Riak at PosterousJulio Capote
San Francisco Riak Meetup1/18/2012
A/S/L?
• Julio Capote
• Backend Developer at Posterous
• @capotej
• Allows anyone to create multiple private or public spaces (blogs)
• Around since 2008
• Millions of posts and users
• Tons of long tail traffic
Some of the first posts are still being accessed today due to search engines
How we store posts
• Original post body goes into MySQL
• Multiple variants are generated (nojs, mobile, etc)
• Expensive to generate (sanitizers, expanders)
Enter Variant Cache
• A generic read/write-through cache library
• Started with Memcache
• Moved to Redis
At the time disk store looked promising, so we moved from memcache to redis
Redis is awesome, but• Requires both the key and value go into
memory
• Terrible disk store performance
• Even with 3 machines with 64gb ram, couldn’t fit entire working set
• Forced to set a TTL
redis wasn’t really designed to ever hit the disk
The Dream
What we wanted
• Key/Value store
• Disk backed
• Built in distribution
• Use less boxes to serve more users
• Consistent performance over raw performance
Percona MySQL / HandlerSocket
MySQL / HandlerSocket
• Great performance
• Can handle a huge number of rows
• Mature / Safe (at least the mysql part)
The Good
MySQL /HandlerSocket
• Sharding definitely not built in
• HandlerSocket is pretty much abandoned
The Bad
No support going forward
MongoDB
• Crazy fast
• Built in sharding support
• ...did I mention it was fast?
The Good
MongoDB
• 30% standard deviation on fetch times (!)
• Would falsely acknowledge a write
The Bad
This is probably tunable, but still
Riak + Bitcask
• Distributed by default
• Consistent and predictable performance
• Highly concurrent, no perf degradation
• Ops guy loves it!
The Good
Riak + Bitcask
• Not crazy fast
• Stuck it behind memcache
• Still way faster than generating
• No multi get support
The Bad
write and read through memcache
Riak in production
• Started using our 3 node cluster for the global production cache
• Accidentally turned off a node
• Keys rebalanced, site didn’t skip a beat
• No one even noticed till hours later
Stats
• 3 nodes
• 2600+ requests/second
• 300+ GB
• ~200 million keys
• 10 GB memcache/host
#Protips
• All nodes can serve all requests, so...
• Use a vip, or...
• Pass all cluster nodes to client driver (thanks @aphyr!)
• Use curb instead of net/http
• Use Keep Alive
Any Questions?
Thanks for listening!
Special thanks to@twoism@vincentchu@kangchen@argv0@pharkmillups@seancribbs@aphyr@jrecursive