49
Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010

Caching techniques in python, europython2010

Embed Size (px)

DESCRIPTION

Slides from europython2010 conference in Birmingham on the subject of caching in python.

Citation preview

Page 1: Caching techniques in python, europython2010

Caching techinques in python

Michael Domanskieuropython 2010

czwartek, 22 lipca 2010

Page 2: Caching techniques in python, europython2010

who I am

• python developer, professionally for a few years now

• experienced also in c and objective-c

• currently working for 10clouds.com

czwartek, 22 lipca 2010

Page 3: Caching techniques in python, europython2010

Interesting intro

• a bit of theory

• common patterns

• common problems

• common solutions

czwartek, 22 lipca 2010

Page 4: Caching techniques in python, europython2010

How I think about cache

• imagine a giant dict storing all your data

• you have to manage all data manually

• or provide some automated behaviour

czwartek, 22 lipca 2010

Page 5: Caching techniques in python, europython2010

similar to....

• manual memory managment in c

• cache is memory

• and you have to controll it manually

czwartek, 22 lipca 2010

Page 6: Caching techniques in python, europython2010

profits

• improved performance

• ...?

czwartek, 22 lipca 2010

Page 7: Caching techniques in python, europython2010

problems

• managing any type of memory is hard

• automation often have to be done custom each time

czwartek, 22 lipca 2010

Page 8: Caching techniques in python, europython2010

common patterns

czwartek, 22 lipca 2010

Page 9: Caching techniques in python, europython2010

memoization

czwartek, 22 lipca 2010

Page 10: Caching techniques in python, europython2010

• very old pattern (circa 1968)

• we own the name to Donald Mitchie

czwartek, 22 lipca 2010

Page 11: Caching techniques in python, europython2010

• we assosciate input with output, and store in somewhere

• based on the assumption that for a given input, output is always the same

how it works

czwartek, 22 lipca 2010

Page 12: Caching techniques in python, europython2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

Page 13: Caching techniques in python, europython2010

what if output can change?

• our pattern is still usefull

• we simply need to add something

czwartek, 22 lipca 2010

Page 14: Caching techniques in python, europython2010

cache invalidation

czwartek, 22 lipca 2010

Page 15: Caching techniques in python, europython2010

There are only two hard problems in Computer Science: cache invalidation and naming things

Phil Karlton

czwartek, 22 lipca 2010

Page 16: Caching techniques in python, europython2010

• basically, we update data in cache

• we need to know when and what to change

• the more granular you want to be, the harder it gets

czwartek, 22 lipca 2010

Page 17: Caching techniques in python, europython2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

Page 18: Caching techniques in python, europython2010

common problems

czwartek, 22 lipca 2010

Page 19: Caching techniques in python, europython2010

invalidating too much/not enough

• flushing all data any time something changes

• not flushing cache at all

• tragic effects

czwartek, 22 lipca 2010

Page 20: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('big_key1')def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings()

return [simple_function1(),simple_function2()]

if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"

czwartek, 22 lipca 2010

Page 21: Caching techniques in python, europython2010

invalidating too soon/too late

• your cache have to be synchronised to you db

• sometimes very hard to spot

• leads to tragic mistakes

czwartek, 22 lipca 2010

Page 22: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2()

if __name__ == '__main__': some_bigger_function()

czwartek, 22 lipca 2010

Page 23: Caching techniques in python, europython2010

superposition of dependancy

• somehow less obvious problem

• eventually you will start caching effects of computation

• you have to know very preciselly of what your data is dependant

czwartek, 22 lipca 2010

Page 24: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('key')def some_bigger_function():

return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) }

if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys"

czwartek, 22 lipca 2010

Page 25: Caching techniques in python, europython2010

summing up

• know your data....

• be aware what and when you cache

• take care when using cached data in computation

czwartek, 22 lipca 2010

Page 26: Caching techniques in python, europython2010

common solutions

czwartek, 22 lipca 2010

Page 27: Caching techniques in python, europython2010

process level cache

czwartek, 22 lipca 2010

Page 28: Caching techniques in python, europython2010

why?

• very fast access

• simple to implement

• very effective as long as you’re using single process

czwartek, 22 lipca 2010

Page 29: Caching techniques in python, europython2010

clever tricks with dicts

czwartek, 22 lipca 2010

Page 30: Caching techniques in python, europython2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

Page 31: Caching techniques in python, europython2010

invalidation

czwartek, 22 lipca 2010

Page 32: Caching techniques in python, europython2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

Page 33: Caching techniques in python, europython2010

application level cache

czwartek, 22 lipca 2010

Page 34: Caching techniques in python, europython2010

memcache

czwartek, 22 lipca 2010

Page 35: Caching techniques in python, europython2010

• battle tested

• scales

• fast

• supports a few cool features

• behaves a lot like dict

• supports time-based expiration

czwartek, 22 lipca 2010

Page 36: Caching techniques in python, europython2010

• python-memcache

• python-libmemcache

• python-cmemcache

• pylibmc

libraries?

czwartek, 22 lipca 2010

Page 37: Caching techniques in python, europython2010

why no benchmarks

• not the point of this talk :)

• benchmarks are generic, caching is specific

• pick your flavour, think for yourself

czwartek, 22 lipca 2010

Page 38: Caching techniques in python, europython2010

cache = memcache.Client(['localhost:11211'])

def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper

code example

czwartek, 22 lipca 2010

Page 39: Caching techniques in python, europython2010

invalidation

czwartek, 22 lipca 2010

Page 40: Caching techniques in python, europython2010

def mem_invalidate(key): cache.set(str(key), None)

code example

czwartek, 22 lipca 2010

Page 41: Caching techniques in python, europython2010

batch key managment

czwartek, 22 lipca 2010

Page 42: Caching techniques in python, europython2010

• what if I don’t want to expire each key manually

• that’s a lot to remember

• and we have to be carefull :(

czwartek, 22 lipca 2010

Page 43: Caching techniques in python, europython2010

groups?

• group keys into sets

• which are tied to one key per set

• expire one key, instead of twenty

czwartek, 22 lipca 2010

Page 44: Caching techniques in python, europython2010

how to get there?

• store some extra data

• you can store dicts in cache

• and cache behaves like dict

• so it’s a case of comparing keys and values

czwartek, 22 lipca 2010

Page 45: Caching techniques in python, europython2010

#we start with specified key and groupkey='some_key'group='some_group'

# now retrieve some data from memcacheddata=memcached_client.get_multi(key, group)# now data is a dict that should look like #{'some_key' :{'group_key' : '1234',# 'value' : 'some_value' },# 'some_group' : '1234'}#if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']

czwartek, 22 lipca 2010

Page 46: Caching techniques in python, europython2010

def cached(key, group_key='', exp_time=0 ):

# we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper

czwartek, 22 lipca 2010

Page 47: Caching techniques in python, europython2010

questions?

czwartek, 22 lipca 2010

Page 49: Caching techniques in python, europython2010

follow me

twitter: mdomansblog: blog.mdomans.com

czwartek, 22 lipca 2010