20
Caching Willem Visser RW334

Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Embed Size (px)

Citation preview

Page 1: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Caching

Willem VisserRW334

Page 2: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Overview

• AppEngine Datastore• No Caching• Naïve Caching• Caching invalidation• Cache updating• Memcached• Beyond your code

Page 3: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

AppEngine Python Datastore

• Datastore – db

• Old and will be going away at some point

– ndb (https://developers.google.com/appengine/docs/python/ndb/)

• New and supports some cool featuresfrom google.appengine.ext import ndb

class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)

Page 4: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

NDB• Python class defines the model• Each entity has a key, which in turn has a

parent, up to the root that has no parent– Entities in this chain is in the same group– Entities in the same group has consistency

guarantees

stuff_title = self.request.get(’stuff_name')stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"),                   content = self.request.get('content'))stuff.put()

Page 5: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

NDB (2)• Queries and Indexes• There are very many ways to query• Complex queries might need complex indexes

– NDB creates simple indexes automatically– Complex ones can be defined in index.yaml

• GQL is similar to SQL• Only gets executed when accessed

stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10")stuff = list(stuff)

Page 6: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

No Caching

• Every db_read hits the database• Database reads tend not to be the fastest

thing• This can be very inefficient therefore

Page 7: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

ExampleNo Caching

def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff)

class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

Page 8: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Naïve Caching

• This will do wonders for performance• If the cache is too large it might start to slow down

a bit• Above the db_read is avoided but rendering HTML

could also be cached if that takes a lot of time

If not cache[request]: cache[request] = db_read();return cache[request]

Page 9: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

ExampleNo Caching

def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff)

class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

Page 10: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

ExampleCACHE = {}def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff

class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")

Page 11: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

New data?

• Will the previous solution work?• What happens if you add new data

– Added to the DB and then redirect to /– Render_front calls top_stuff– However cache is hit and we get the old data

• Cache must be invalidated when new data comes

Page 12: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Clear CacheCACHE = {}def top_stuff(): …

class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put()

CACHE.clear()

self.redirect("/")

Page 13: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Cache Stampede• If one user writes new data

– Cache gets cleared

• Now lots of users all access the site at the same time– All of them doing db_reads since the cache is empty

• This hammers the DB and slows everybody down– Depending on settings the DB might also block or even

crash

• Without any caching this could also happen

Page 14: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Cache Refreshdef top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff

class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")

Page 15: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Cache Update

• Most aggressive solution– No DB reads!

• On new data, store in the DB and also directly into the cache, without reading from the DB

• The DB is just a backup storage now for in case something goes wrong, such as a server going down

Page 16: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Cache ComparisonsCache Approach

DB_Read onpage view

DB_Read onsubmit

Wrong results

None Always None

Naïve Cache miss None Yes

Refresh Seldom Once

Update None None

Page 17: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Sharing a Cache

• If we have more than one server• Do we have a cache for each server, or, share a

cache amongst servers?• Cache for each server can have suboptimal

behavior if they are not synchronized– Data might be in the cache on server 1 and not

server 2, for example

• Good solution is to use a very fast shared cache

Page 18: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

Memcached• See http://memcached.org/• Very fast, in-memory, key-value store• Caching technology behind very many websites• Support for it within AppEnginefrom google.appengine.api import memcache…def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff

Page 19: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

NDB and Caching• Two Caches controlled by policies

– In context (microseconds)• Only current http request• Writes to datastore and cache, reads first checks cache

– Memcache (milliseconds)• All nontransactional context caches here• All contexts share same memcache• Within a transaction memcache is not used

• Can be configured by policies– Some standard ones available

Page 20: Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code

More Caching• Some caches also live outside the developers

immediate control• Browser Cache

– Single user

• Proxy Cache– Multiple users

• Gateway Cache– Distributed by Content Delivery Networks

• HTTP 1.1 supports “Cache-Control” header– Allows developers to control how things are cached