Performant Django - Ara Anjargolian

  • View
    2.428

  • Download
    1

  • Category

    Software

Preview:

DESCRIPTION

http://www.hakkalabs.co/articles/performant-django-best-practices

Citation preview

Performant Django

Ara Anjargolian Co-Founder & CTO, YCharts

There are two distinct kinds of performance issues

Predictably, they are: front-end and back-end.

Handling them effectively, requires very different approaches.

First, a quick note about frontend performance

80-90% of the end-user response time is spent on the frontend.  Start there.

-Steve Souders

Front-End Performance Work

•  Can be universally applied

•  Requires systems/tooling changes

•  Often has clear, system-independent best practices

Best Practice: Cache static assets forever (as long as they don’t change)

Why: Download assets as infrequently as possible

Solution: Already done! (As long as you use CachedStaticFilesStorage or CachedFilesMixin with your own storage)

Best practice: Bundle/minify/ compress static assets

Why: Reduce # of requests, download time

Solution: Use a static-asset-manager. 2 good ones: django-pipeline, webassets.

Bonus points: Lower number of requests by using data URIs for images (which pipeline supports)

Best Practice: Serve static files via a CDN.

Why: Less latency Solution: Good: Store in filesystem, point STATIC_URL to CDN with an origin to your URL. Better: Use django-storages/STATICFILES_STORAGE storage setting to store in cloud file storage (i.e. S3) and point CDN to it.

Best Practice: Serve more stuff as static assets.

Why: Static assets can be served faster, more efficiently than dynamic assets.

Solution: Front-end templates, static-y data structures that can be served as JSON.

All that’s required are some custom management commands.

Back-End Performance Work

•  Can really only be done on a case by case basis.

•  Often only requires code changes.

•  Is very site-specific.

OK, I lied, there are some global back-end performance to-dos.

•  Use cached sessions (contrib.sessions.backends.cache or contrib.sessions.backends.cached_db)

•  Use cached template loader

•  If you’re starting a new project, or do a ton heavy weight templates, consider using jinja2 as your template engine.

But on to the real stuff!

OK, I lied, first a disclaimer

DO NOT try to “optimize” every view.

•  This is an utter waste of time, as there will be diminishing returns.

•  Optimizing on the backend often means adding complexity. And in a multi-programmer environment, complexity is expensive!

Backend performance work starts with a profile of the “problem” view

Use a profiler middleware!

(A good one: https://gist.github.com/Miserlou/3649773)

What does a profile look like?

Understanding a profile

Things to look for:

•  Tons of time spent in SQL?

•  Particular functions being called to where the function call is taking longer than you would expect, or, the function is being called way too much?

What if the problem is SQL?

First use django-debug-toolbar, or, django-devserver to identify the problem queries.

Is the issue one slow query? Too many queries?

SQL Tricks, Part 1

•  select_related(): Helps avoid extra queries to grab objects referenced by foreign keys/one to one relationships

•  values/values_list(): Avoid Python object creation overhead when dicts/lists are good enough

•  db_index=True: if you are referencing objects by field that’s not it’s primary/foreign key and does not have a uniqueness constraint on it, you might need this

SQL Tricks, Part 2

•  prefetch_related(): Like select related except the “join” is done in Python and thus works for M2M

•  only(): Only grab fields in the model you need (USE WITH CAUTION!)

•  defer(): Get all fields except those stated in defer()

•  bulk_create(): When writing lots of rows to same table

What if the problem is SQL and none of the above helps?

•  raw(): -Roll your own SQL that can perhaps use stuff specific to the DB, or fancier queries.

•  Denormalization: Less joins, precomputed data

•  No SQL: Maybe the data you are storing in a relational database doesn’t map well to a relational database.

What if the problem is in the Python?

Common issues:

•  Algorithmic issues like n^2 paths that don’t need to be n^2

•  Doing extra work like constantly re-evaluating a loop invariant inside a loop

•  Using if/else for error controls where exceptions will do (again, most problematic inside a loop)

In general: People doing bad stuff inside loops.

What if you optimized your Python/SQL and you’re still slow?

Cache.

Then cache some more.

•  View cache

•  Template fragment cache

•  Function level cache (via package like django-cache-utils, django-cache-helper)

•  Query cache (django-cache-machine, django-cacheops)

Many types of caching

The End

Questions?

ara@ycharts.com

http://github.com/ara818

Like solving complex performance problems?

YCharts is hiring!

Recommended