Upload
shonda-powers
View
214
Download
0
Embed Size (px)
Citation preview
Caching
Willem VisserRW334
Overview
• AppEngine Datastore• No Caching• Naïve Caching• Caching invalidation• Cache updating• Memcached• Beyond your code
AppEngine Python Datastore
• Datastore – db
• Old and will be going away at some point
– ndb (https://developers.google.com/appengine/docs/python/ndb/)
• New and supports some cool featuresfrom google.appengine.ext import ndb
class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)
NDB• Python class defines the model• Each entity has a key, which in turn has a
parent, up to the root that has no parent– Entities in this chain is in the same group– Entities in the same group has consistency
guarantees
stuff_title = self.request.get(’stuff_name')stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"), content = self.request.get('content'))stuff.put()
NDB (2)• Queries and Indexes• There are very many ways to query• Complex queries might need complex indexes
– NDB creates simple indexes automatically– Complex ones can be defined in index.yaml
• GQL is similar to SQL• Only gets executed when accessed
stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10")stuff = list(stuff)
No Caching
• Every db_read hits the database• Database reads tend not to be the fastest
thing• This can be very inefficient therefore
ExampleNo Caching
def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff)
class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
Naïve Caching
• This will do wonders for performance• If the cache is too large it might start to slow down
a bit• Above the db_read is avoided but rendering HTML
could also be cached if that takes a lot of time
If not cache[request]: cache[request] = db_read();return cache[request]
ExampleNo Caching
def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff)
class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
ExampleCACHE = {}def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff
class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
New data?
• Will the previous solution work?• What happens if you add new data
– Added to the DB and then redirect to /– Render_front calls top_stuff– However cache is hit and we get the old data
• Cache must be invalidated when new data comes
Clear CacheCACHE = {}def top_stuff(): …
class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put()
CACHE.clear()
self.redirect("/")
Cache Stampede• If one user writes new data
– Cache gets cleared
• Now lots of users all access the site at the same time– All of them doing db_reads since the cache is empty
• This hammers the DB and slows everybody down– Depending on settings the DB might also block or even
crash
• Without any caching this could also happen
Cache Refreshdef top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff
class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")
Cache Update
• Most aggressive solution– No DB reads!
• On new data, store in the DB and also directly into the cache, without reading from the DB
• The DB is just a backup storage now for in case something goes wrong, such as a server going down
Cache ComparisonsCache Approach
DB_Read onpage view
DB_Read onsubmit
Wrong results
None Always None
Naïve Cache miss None Yes
Refresh Seldom Once
Update None None
Sharing a Cache
• If we have more than one server• Do we have a cache for each server, or, share a
cache amongst servers?• Cache for each server can have suboptimal
behavior if they are not synchronized– Data might be in the cache on server 1 and not
server 2, for example
• Good solution is to use a very fast shared cache
Memcached• See http://memcached.org/• Very fast, in-memory, key-value store• Caching technology behind very many websites• Support for it within AppEnginefrom google.appengine.api import memcache…def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff
NDB and Caching• Two Caches controlled by policies
– In context (microseconds)• Only current http request• Writes to datastore and cache, reads first checks cache
– Memcache (milliseconds)• All nontransactional context caches here• All contexts share same memcache• Within a transaction memcache is not used
• Can be configured by policies– Some standard ones available
More Caching• Some caches also live outside the developers
immediate control• Browser Cache
– Single user
• Proxy Cache– Multiple users
• Gateway Cache– Distributed by Content Delivery Networks
• HTTP 1.1 supports “Cache-Control” header– Allows developers to control how things are cached