Upload
it-people
View
1.090
Download
2
Embed Size (px)
DESCRIPTION
Redis, the hacker's database: - simple_queue: feature set, comparison with Celery and Rq - redis_graph: available options, integration with other tools, and the big-O performance - bitmapist, idea, archtecture, reports based on cohorts - optionally: tagged-logger / ormist (lightweight Object-to-Redis mapper) - optionally: scripting possibility of Lua, Lua-jit (almost as fast as C)
Citation preview
The Hacker’s Database
Amir Salihefendic (amix)
About Me
• Co-‐founder and former CTO of Plurk.com • Helped Plurk scale to millions of users, billions of pages views and 8+ billion unique data items. With minimal hardware!
• Founder of Doist.io creators of Todoist and Wedoist
Outline of the talk • Plurk Timelines opKmizaKon: How we saved hundreds of thousands of dollars
• What’s great about Redis?
• Different sample implementaKons: – redis_wrap – redis_graph – redis_queue
• Advanced analyKcs using Redis – bitmapist and bitmapist.cohort
Problem ExponenKal data growth in Social Networks
data size
number of users
The Easy Solu=on Throw money at the problem
The Smarter Solu=on
data size
number of users
Reduce to linear data growth
Example: Timelines
Example: Timelines
timelinedata size
number of users
Example: Timelines
timelinedata size
number of users
SoluKon: Chea=ng! Make Kmelines a fixed size -‐ 500 messages
• O(1) inserKon • O(1) update • Cache able
Plurk’s =melines migra=on path
• Problem with MySQL and Tokyo Tyrant? Death by IO
Tokyo Tyrant
What’s great about Redis?
• Everything is in memory, but the data is persistent. • Amazing performance: 100.000+ SETs pr. sec 80.000+ GETs pr. sec
Redis Rich Datatypes
• Rela=onal databases Schemas, tables, columns, rows, indexes etc. • Column databases (BigTable, hBase etc.) Schemas, columns, column families, rows etc. • Redis key-‐value, sets, lists, hashes, bitmaps, etc.
Redis datatypes resemble datatypes in programming languages. They are natural to us!
redis_wrap
• Implements a wrapper for Redis datatypes so they mimic the datatypes found in Python
• 100 lines of code
• h_ps://github.com/Doist/redis_wrap
redis_wrap
# Mimic of Python listsbears = get_list('bears')bears.append('grizzly')assert len(bears) == 1assert 'grizzly' in bears
# Mimic of hashes villains = get_hash('villains')assert 'riddler' not in villainsvillains['riddler'] = 'Edward Nigma'assert 'riddler' in villainsassert len(villains.keys()) == 1del villains['riddler']assert len(villains) == 0
# Mimic of Python setsfishes = get_set('fishes')assert 'nemo' not in fishesfishes.add('nemo')assert 'nemo' in fishesfor item in fishes: assert item == 'nemo'
redis_graph
• Implements a simple graph database in Python
• Can scale to a few million nodes easily
• You could use something similar to implement LinkedIn’s “who is connected to who” feature
• Under 40 lines of code
• h_ps://github.com/Doist/redis_graph
redis_graph
# Adding an edge between nodesadd_edge(from_node='frodo', to_node='gandalf')assert has_edge(from_node='frodo', to_node='gandalf') == True # Getting neighbors of a nodeassert list(neighbors('frodo')) == ['gandalf']# Deleting edgesdelete_edge(from_node='frodo', to_node='gandalf')
# Setting node valuesset_node_value('frodo', '1')assert get_node_value('frodo') == '1'# Setting edge valuesset_edge_value('frodo_baggins', '2')assert get_edge_value('frodo_baggins') == '2'
redis_graph: The implementaKon from redis_wrap import *#--- Edges ----------------------------------------------def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node )def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y )def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edgesdef neighbors(node_x, system='default'): return get_set( node_x, system=system )
#--- Node values ----------------------------def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key )def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value )#--- Edge values -----------------------------def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key )def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value )
redis_queue
• Implements a queue in Python using Redis
• Used to process millions of background tasks on Plurk / Todoist / Wedoist daily (billions in total)
• Implementa=on: 18 lines “real” implementaKon a bit bigger
• h_ps://github.com/Doist/redis_simple_queue
redis_queue
from redis_simple_queue import *delete_jobs('tasks')put_job('tasks', '42')assert 'tasks' in get_all_queues()assert queue_stats('tasks')['queue_size'] == 1assert reserve_job('tasks') == '42'assert queue_stats('tasks')['queue_size'] == 0
redis_queue: Implementa=on from redis_wrap import *def put(queue, job_data, system='default'): get_list(queue, system=system).append(job_data)def reserve(queue, system='default'): return get_list(queue, system=system).pop()def delete_jobs(queue, system='default'): get_redis(system).delete(queue)def get_all_queues(system='default'): return get_redis(system).keys('*').split(' ')def queue_stats(queue, system='default'): return { 'queue_size': len(get_list(queue)) }
bitmapist and bitmapist.cohort
• Implements an advanced analyKcs library on top of Redis bitmaps. Saved us $2000 USD/month (Mixpanel)! • bitmapist h_ps://github.com/Doist/bitmapist • bitmapist.cohort Cohort analyKcs (retenKon)
bitmapist: What does it help with?
• Has user 123 been online today? This week? • Has user 123 performed acKon "X"? • How many users have been acKve have this month? • How many unique users have performed acKon "X" this week?
• How many % of users that were acKve last week are sKll acKve?
• How many % of users that were acKve last month are sKll acKve this month?
• Bitmapist can answer thisfor millions of users and most operaKons are O(1)! Using very small amounts of memory.
What are bitmaps?
• Opera=ons: SETBIT, GETBIT, BITCOUNT, BITOP
• SETBIT somekey 8 1
• GETBIT somekey 8
• BITOP AND destkey somekey1 somekey2
• h_p://en.wikipedia.org/wiki/Bit_array
bitmapist: Using it # Mark user 123 as active and has played a songmark_event('active', 123)mark_event('song:played', 123)# Answer if user 123 has been active this monthassert 123 in MonthEvents('active', now.year, now.month)assert 123 in MonthEvents('song:played', now.year, now.month)# How many users have been active this week?print len(WeekEvents('active', now.year, now.isocalendar()[1]))# Perform bit operations. How many users that# have been active last month are still active this month?active_2_months = BitOpAnd( MonthEvents('active', last_month.year, last_month.month), MonthEvents('active', now.year, now.month))print len(active_2_months)
bitmapist.cohort: Manage retenKon!
h_p://amix.dk/blog/post/19718
• Goal: InvenKng a modern way to work together
• Join an amazing team of 13 people from all around the world. A profitable business. 500.000+ users.
• Work from anywhere. Hacker friendly culture. Python. CompeKKve salaries.
• We are hiring: [email protected] www.doist.io
Ques=ons and Answers
• Slides will be posted to h_p://amix.dk/
• For “offline” quesKons contact: [email protected]