Upload
daniel-roseman
View
33.828
Download
0
Embed Size (px)
Citation preview
Advanced Django ORM techniques
Daniel Roseman http://blog.roseman.org.uk
About Me
• Python user for five years
• Discovered Django four years ago
• Worked full-time with Python/Django since 2008.
• Top Django answerer on StackOverflow!
• Occasionally blog on Django, concentrating on efficient use of the ORM.
Contents
• Behind the scenes: models and fields
• How model relationships work
• More efficient relationships
• Other optimising techniques
Django ORM efficiency: a story
414 queries!
How can you stop this happening to you?
http://www.flickr.com/photos/m0n0/4479450696
Behind the scenes: models and fields
http://www.flickr.com/photos/spacesuitcatalyst/847530840
Defining a model
• Model structure initialised via metaclass
• Called when model is first defined
• Resulting model class stored in cache to use when instantiated
Fields
• Fields have contribute_to_class
• Adds methods, eg get_FOO_display()
• Enables use of descriptors for field access
Model metadata
• Model._meta
• .fields
• .get_field(fieldname)
• .get_all_related_objects()
Model instantiation
• Instance is populated from database initially
• Has no subsequent relationship with db until save
• No identity between models
Querysets
• Model manager returns a queryset: foos = Foo.objects.all()
• Queryset is an ordered list of instances of a single model
• No database access yet
• Slice: foos[0]
• Iterate: {% for foo in foos %}
Where do all those queries come from?
• Repeated queries
• Lack of caching
• Relational lookup
• Templates as well as views
Repeated queries
def get_absolute_url(self): return "%s/%s" % ( self.category.slug, self.slug )
Same category, but query is repeated for each article
• Same link on every page
• Dynamic, so can't go in urlconf
• Could be cached or memoized
Repeated queries
Relationships
http://www.flickr.com/photos/katietegtmeyer/124315322
Relational lookups
• Forwards:
foo.bar.field
• Backwards:
bar.foo_set.all()
Example modelsclass Foo(models.Model):
name = models.CharField(max_length=10)
class Bar(models.Model):
name = models.CharField(max_length=10)
foo = models.ForeignKey(Foo)
Forwards relationship
>>> bar = Bar.objects.all()[0]
>>> bar.__dict__
{'id': 1, 'foo_id': 1, 'name': u'item1'}
Forwards relationship
>>> bar.foo.name
u'item1'
>>> bar.__dict__
{'_foo_cache': <Foo: Foo object>, 'id': 1, 'foo_id': 1, 'name': u'item1'}
Fowards relationships
• Relational access implemented via a descriptor:django.db.models.fields.related.SingleRelatedObjectDescriptor
• __get__ tries to access _foo_cache
• If doesn't exist, does lookup and creates cache
select_related
• Automatically follows foreign keys in SQL query
• Prepopulates _foo_cache
• Doesn't follow null=True relationships by default
• Makes query more expensive, so be sure you need it
Backwards relationships{% for foo in my_foos %}
{% for bar in foo.bar_set.all %}
{{ bar.name }}
{% endfor %}
{% endfor %}
Backwards relationships
• One query per foo
• If you iterate over foo_set again, you generate a new set of db hits
• No _foo_cache
• select_related does not work here
Optimising backwards relationships
• Get all related objects at once
• Sort by ID of parent object
• Then cache in hidden attribute as with select_related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related
Optimising backwards[{'time': '0.000', 'sql': u'SELECT "foobar_foo"."id", "foobar_foo"."name" FROM "foobar_foo"'},
{'time': '0.000', 'sql': u'SELECT "foobar_bar"."id", "foobar_bar"."name", "foobar_bar"."foo_id" FROM "foobar_bar" WHERE "foobar_bar"."foo_id" IN (SELECT U0."id" FROM "foobar_foo" U0)'}]
Optimising backwards
• Still quite expensive, as can mean large dependent subquery – MySQL in particular very bad at these
• But now just two queries instead of n
• Not automatic – need to remember to use _related_items attribute
Generic relations• Foreign key to ContentType, object_id
• Descriptor to enable direct access
• iterating through creates n+m queries(n=number of source objects,m=number of different content types)
• ContentType objects automatically cached
• Forwards relationship creates _foo_cache
• but select_related doesn't work
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
Other optimising techniques
Memoizing
• Cache property on first access
• Can cache within instance, if multiple accesses within same request
def get_expensive_items(self): if not hasattr(self, '_cache'): self._cache = self.expensive_op() return self._cache
DB Indexes
• Pay attention to slow query log and debug toolbar output
• Add extra indexes where necessary - especially for multiple-column lookup
• Use EXPLAIN
Outsourcing
• Does all the logic need to go in the web app?
• Services - via eg Piston
• Message queues
• Distributed tasks, eg Celery
Summary
• Understand where queries are coming from
• Optimise where necessary, within Django or in the database
• and...
PROFILE