Download pdf - Practical Celery

Transcript
Page 1: Practical Celery

PRACTICALCELERY

Page 2: Practical Celery

CAMERON MASKEtwitter: @cameronmaske

email: [email protected]

web: http://cameronmaske.com

Page 3: Practical Celery

WHAT WE'LLCOVER...

Page 4: Practical Celery

WHAT IS CELERY?HOW DOES IT WORK?

Page 5: Practical Celery

USING CELERY, BESTPRACTICES AND SCALING.

Page 6: Practical Celery

SURVEY

Page 7: Practical Celery

CELERYASYNCHRONOUS

DISTRIBUTEDTASK QUEUE

Page 8: Practical Celery

OUT OF THEREQUEST/RESPONSE

CYCLE.Example: Sending emails asynchronously.

Page 9: Practical Celery

TASKS IN THEBACKGROUND.

Example: Computational heavy jobs.Example: Interacting with external APIs.

Page 10: Practical Celery

PERIODIC JOBS.

Page 11: Practical Celery

HISTORYPython.Released (0.1) in 2009.Currently on 3.1, with 3.2 in alpha.Developed by Ask Solem (@asksol)

Page 12: Practical Celery

ARCHITECTURE

Page 13: Practical Celery

PRODUCERProduces a task for the queue.

Page 14: Practical Celery

BROKERStores the task backlogAnswers, what work remains to be done?RabbitMQ, Redis, SQLAlchemy, Django's ORM, MongoDB...

Page 15: Practical Celery

WORKERExecute and consumes tasks.Distributed.

Page 16: Practical Celery

RESULTS BACKEND.Stores the results from our tasks.Redis, Redis, SQLAlchemy, Django's ORM, MongoDB...Optional!

Page 17: Practical Celery

EXAMPLE

Page 18: Practical Celery

from celery import Celery

app = Celery('tasks', backend='amqp', broker='amqp://guest@localhost//')

@app.taskdef add(x, y): return x + y

Page 19: Practical Celery

>>> result = add.delay(4, 4)>>> result.state'SUCCESS'>>> result.id'4cc7438e-afd4-4f8f-a2f3-f46567e7ca77'>>> result.get()8

http://celery.readthedocs.org/en/latest/reference/celery.result.html

Page 20: Practical Celery

PICK YOUR [email protected] add(x, y): return x + y

add(2, 4)

class AddTask(app.Task): def run(self, x, y): return x + y

AddTask().run(2, 4)

Page 21: Practical Celery

# Asyncadd.delay(2, 4)add.apply_aync(args=(2, 4), expires=30)# Eager!result = add.apply(args=(2, 4)) # Executes locally.# Or...add(2, 4) # Does not return a celery result!

Page 22: Practical Celery

INTERGRATING WITHDJANGO.

Page 23: Practical Celery

BEWARE OF DJANGO-CELERY.

Page 24: Practical Celery

http://docs.celeryproject.org/en/master/django/first-steps-with-django.html

- project/ - config/__init__.py - config/settings.py - config/urls.py- manage.py

Page 25: Practical Celery

# project/config/celery.py

from __future__ import absolute_import

import os

from celery import Celery

from django.conf import settings

# Set the default Django settings module for the 'celery' program.os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')

app = Celery('app')

# Using a string here means the worker will not have to# pickle the object when using Windows.app.config_from_object('django.conf:settings')app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

@app.task(bind=True)def debug_task(self): print('Request: {0!r}'.format(self.request))

Page 26: Practical Celery

# project/config/__init__.pyfrom __future__ import absolute_import

# This will make sure the app is always imported when# Django starts so that shared_task will use this app.from .celery import app as celery_app

__all__ = ['celery_app']

Page 27: Practical Celery

celery -A project worker -l info

Page 28: Practical Celery

TESTING# settings.pyimport sysif 'test' in sys.argv: CELERY_EAGER_PROPAGATES_EXCEPTIONS=True, CELERY_ALWAYS_EAGER=True, BROKER_BACKEND='memory'

Page 29: Practical Celery

PATTERNSAND BEST

PRACTICES.

Page 30: Practical Celery

NEVER PASS OBJECTS ASARGUMENTS.

Page 31: Practical Celery

# [email protected]()def send_reminder(reminder): reminder.send_email()

# [email protected]()def send_reminder(pk): try: reminder = Reminder.objects.get(pk=pk) except Reminder.DoesNotExist: return reminder.send_email()

Page 32: Practical Celery

KEEP TASKS GRANUAL.CAN PROCESS MORE IN

PARALLEL.

Page 33: Practical Celery

AVOID LAUNCHINGSYNCHRONOUS

SUBTASKS

Page 34: Practical Celery

# [email protected] update_page_info(url): page = fetch_page.delay(url).get() info = parse_page.delay(url, page).get() store_page_info.delay(url, info)

@app.taskdef fetch_page(url): return myhttplib.get(url)

@app.taskdef parse_page(url, page): return myparser.parse_document(page)

@app.taskdef store_page_info(url, info): return PageInfo.objects.create(url, info)

Page 35: Practical Celery

# Gooddef update_page_info(url): chain = fetch_page.s() | parse_page.s() | store_page_info.s(url) chain()

@app.task()def fetch_page(url): return myhttplib.get(url)

@app.task()def parse_page(page): return myparser.parse_document(page)

@app.task(ignore_result=True)def store_page_info(info, url): PageInfo.objects.create(url=url, info=info)

http://celery.readthedocs.org/en/latest/userguide/canvas.html

Page 36: Practical Celery

PERIODIC TASKS.http://celery.readthedocs.org/en/latest/userguide/periodic-

tasks.html

Page 37: Practical Celery

from datetime import timedelta

@app.periodic_task(run_every=timedelta(minutes=5)):def run_every_five(): pass

Page 38: Practical Celery

from datetime import timedelta

class RunEveryFive(app.PeriodicTask): run_every = timedelta(minutes=5) def run(self): pass

Page 39: Practical Celery

from datetime import timedelta

@app.task():def run_every_five(): pass

CELERYBEAT_SCHEDULE = { 'run-every-five': { 'task': 'tasks.run_every_five', 'schedule': timedelta(seconds=30) }, }

Page 40: Practical Celery

CRON STYLE.from celery.schedules import crontab

crontab(minute=0, hour='*/3') # Every 3 hours.crontab(day_of_week='sunday') # Every minute on Sundays.crontab(0, 0, 0, month_of_year='*/3') # First month of every quarter.

Page 41: Practical Celery

@app.periodic_task(run_every=crontab(minute=0, hour=1))def schedule_emails(): user_ids = User.objects.values_list('id', flat=True) for user_id in user_ids: send_daily_email.delay(user_id)

@app.task()def send_daily_email(user_id): user = User.objects.get(id=user_id) try: today = datetime.now() Email.objects.get( user=user, date__year=today.year, date__month=today.month, date__day=today.day) except Email.DoesNotExist: email = Email(user=user, body="Hey, don't forget to LOGIN PLEASE!") email.send() email.save()

Page 42: Practical Celery

CELERY BEAT A.K.A THESCHEDULER.

celery -A project beat

Page 43: Practical Celery

NEVER RUN A BEAT +WORKER ON A SINGLE

CELERY PROCESS.# Really bad idea....celery -A project worker -B

Page 44: Practical Celery

FREQUENTLY RUNNINGPERIODIC TASKS.

BEWARE OF "TASK STACKING"

Page 45: Practical Celery

Schedule task runs every 5 minutes.Tasks take 30 minutes.Schedule task stacks.Bad stuff.

Page 46: Practical Celery

EXPIRES!from time import sleep

@app.periodic_task(expires=5*60, run_every=timedelta(minutes=5))def schedule_task(): for _ in range(30): one_minute_task.delay()

@app.task(expires=5*60)def one_minute_task(): sleep(60)

Page 47: Practical Celery

THINGS GO WRONG INTASKS!

Page 48: Practical Celery

RETRY

Page 49: Practical Celery

from celery.exceptions import Retry

@app.task(max_retries=10)def gather_data(): try: data = api.get_data() # etc, etc, ... except api.RateLimited as e: raise Retry(exc=e, when=e.cooldown) except api.IsDown: return

Page 50: Practical Celery

ERROR INSIGHT.

Page 51: Practical Celery

SENTRY.

Page 52: Practical Celery

STAGES

Page 53: Practical Celery

class DebugTask(app.Task): def after_return(self, status, retval, task_id, args, kwargs, einfo): print("I'm done!")

def on_failure(self, exc, task_id, args, kwargs, einfo): print("I failed :(")

def on_retry(self, exc, task_id, args, kwargs, einfo): print("I'll try again!")

def on_success(self, retval, task_id, args, kwargs): print("I did it!")

Page 54: Practical Celery

ABSTRACTclass AbstractTask(app.Task): abstract = True def after_return(self, *args, **kwargs): print("All done!")

@app.task(base=AbstractTask)def add(x, y): return x + y

Page 55: Practical Celery

INSTANTIATIONclass DatabaseTask(app.Task): abstract = True _db = None

@property def db(self): if self._db is None: self._db = Database.connect() return self._db

Page 56: Practical Celery

ENSURE A TASK ISEXECUTED ONE AT A TIME

Page 57: Practical Celery

from celery import taskfrom celery.utils.log import get_task_loggerfrom django.core.cache import cachefrom django.utils.hashcompat import md5_constructor as md5from djangofeeds.models import Feed

logger = get_task_logger(__name__)

LOCK_EXPIRE = 60 * 5 # Lock expires in 5 minutes

@taskdef import_feed(feed_url): # The cache key consists of the task name and the MD5 digest # of the feed URL. feed_url_digest = md5(feed_url).hexdigest() lock_id = '{0}-lock-{1}'.format(self.name, feed_url_hexdigest)

# cache.add fails if if the key already exists acquire_lock = lambda: cache.add(lock_id, 'true', LOCK_EXPIRE) # memcache delete is very slow, but we have to use it to take # advantage of using add() for atomic locking release_lock = lambda: cache.delete(lock_id)

logger.debug('Importing feed: %s', feed_url) if acquire_lock(): try: feed = Feed.objects.import_feed(feed_url) finally: release_lock() return feed.url

logger.debug( 'Feed %s is already being imported by another worker', feed_url)

Page 58: Practical Celery

IMPORTANT SETTINGS

Page 59: Practical Celery

# settings.pyCELERY_IGNORE_RESULT = TrueCELERYD_TASK_SOFT_TIME_LIMIT = 500CELERYD_TASK_TIME_LIMIT = 1000

Page 60: Practical Celery

# tasks.pyapp.task(ignore_result=True, soft_time_limit=60, time_limit=120)def add(x, y): pass

Page 61: Practical Celery

# settings.pyCELERYD_MAX_TASKS_PER_CHILD = 500CELERYD_PREFETCH_MULTIPLIER = 4

Page 62: Practical Celery

BROKER

Page 63: Practical Celery

SO MANYCHOICES!

RabbitMQRedisSQLAlchemyDjango's ORMMongoDBAmazon SQSCouchDBBeanstalkIronMQ

Page 64: Practical Celery

DJANGO ORM.# settings.pyBROKER_URL = 'django://'INSTALLED_APPS = ( 'kombu.transport.django',)CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'

python manage.py syncdb

Page 65: Practical Celery

DON'T DO THIS FORANYTHING SERIOUS.

Page 66: Practical Celery

USE RABBITMQ

Page 67: Practical Celery

C OPTIMIZED LIBRARY$ pip install librabbitmq

Page 68: Practical Celery

WORKERS

Page 69: Practical Celery

CONCURRENCYcelery -A project worker -C 10celery -A project worker --autoscale=10,1

Page 70: Practical Celery

INCREASED CONCURRENCY CANQUICKLY DRAIN CONNECTIONS ON

YOUR DATABASEUse a connection pooler (pgbouncer).

Page 71: Practical Celery

ROUTING

Page 72: Practical Celery

CELERY_ROUTES = { 'email.tasks.send_mail': { 'queue': 'priority', },}

# orsend_mail.apply_async(queue="priority")

celery -A project worker -Q email

Page 73: Practical Celery

DEDICATED WORKERS.

Page 74: Practical Celery

BOTTLENECKS

Page 75: Practical Celery

IdentifyFixRepeat

Page 76: Practical Celery

Make tasks faster.Reduce volume of tasks.

Page 77: Practical Celery

NEWRELIC

Page 78: Practical Celery
Page 79: Practical Celery

MONITORING IS VITAL.

Page 80: Practical Celery

RABBITMQ MANGEMENTPLUGIN

Page 81: Practical Celery

RABBITMQ MANGEMENT PLUGINHAS A GREAT HTTP API!

Page 83: Practical Celery

CELERY FLOWER

Page 84: Practical Celery

QUESTIONS?