45
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Embed Size (px)

DESCRIPTION

my presentation in Pycon APAC 2012

Citation preview

Page 1: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Website Monitoring with Distributed Messages/Tasks Processing (AMQP &

RabbitMQ) on Django

Page 2: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

About me?● Rahmat Ramadhan IriantoRahmat Ramadhan Irianto

● Software Developer at Void-Labs & Defpy-LabsSoftware Developer at Void-Labs & Defpy-Labs

● is a Open Source Software Developer Teamis a Open Source Software Developer Team

● A Student from Indonesian University STMIK A Student from Indonesian University STMIK Dipanegara 2010 MakassarDipanegara 2010 Makassar

● Lives in Indonesian, MakassarLives in Indonesian, Makassar● Write Python Apps every dayWrite Python Apps every day

Page 3: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What is Website-Monitoring ?

● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.

Page 4: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What Useful For ?

● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email.

● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customer's websites.

● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page.

● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes.

● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you.

● Keep your up to date news. Monitor news page of your top site news. When they update it, you'll get an email alert.

● And much moreInspirate from changedetectionhttp://www.changedetection.com

Page 5: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What Power build Website-monitoring?

http://goo.gl/hCf34

Page 6: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Python !

( Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc )

http://goo.gl/sSqHh

www.python.org

Page 7: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Django !( Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic

design & Etc)

http://goo.gl/YXnA9

https://www.djangoproject.com/

Page 8: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Mongodb( flexibility, powerfull, Fast,

and ease of use )

http://goo.gl/NZQ18

http://www.mongodb.org

Page 9: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

RabbitMQ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and

managing scalable applications)

http://goo.gl/Pvd9Qhttp://www.rabbitmq.com

Page 10: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Workflow Website-MonitoringWorkflow Website-Monitoring

Page 11: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Ajax Post Post Api

request

Mongodb

Rest Api

Myview

Message queue worker

Scrape page

Alert Email

If Post Api

Publish task

If changepage

Create worker

Procces task

Save result

Save data

Save data

If ajax post

Report Diff

Save data

Page 12: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Lets Talk About

http://goo.gl/m8QUH

Page 13: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Why Mongodb ?

● Greats features of document databases,key-value stores, and relational databases.

● How greats ?● Fast● Smart● Scalable● Schema-less● Dynamic Query● Easy use & etc..

Page 14: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What we gonna Need ?

+ = Pymongo

http://pypi.python.org/pypi/pymongo/

Page 15: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

How to ?import pymongofrom pymongo import Connectioncollection_user = pymongo.Connection().website_monitor.usercollection_monitor = pymongo.Connection().website_monitor.monitorcollection_task = pymongo.Connection().website_monitor.task

INSERTmonitor = {'username':smart_str(request.user), 'user_id':request.user.id, 'url':url, 'datetime':datetime.utcnow(), 'status':status, 'hit':0, 'fail_hit':0, 'period':int(request.POST.get('period')), 'email':collection_user.find_one({'name':str(request.user)})['email'], 'pk':pk, 'last_checking':None, 'task_id':task_id, }collection_monitor.insert(monitor)

Page 16: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

UPDATEcollection_user.update({'name':data_user['id']},{'$set':{'email':data_user['email'], 'firstname':smart_str(data_user['first_name']), 'lastname':smart_str(data_user['last_name']), 'ip': request.META.get('REMOTE_ADDR','unknown'), 'login':datetime.now(), 'user_agent': request.META.get('HTTP_USER_AGENT','unknown'), 'session': request.META.get('XDG_SESSION_COOKIE','unknown'), 'session_fb':session_key, 'ts':datetime.now(), 'authkey':authkey, } } )

REMOVEif collection_content.find({'url':i['url']}).count() == 3: collection_content.remove({'url':i['url'][0]})

Page 17: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Why we must use Distributed Computing

Distributed ComputingIs a method of solving computationalproblem by dividing the problem intomany tasks run simultaneously on

many hardware or software systems(Wikipedia)

Page 18: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What is Message queue ?Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do

Page 19: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

How it work ?

Say a web application server has a task itdoesn’t have time to do• It puts the task in the message queue• Other web servers can access the same queue(s)and put tasks there• Workers are greedy and they all watch thequeues for tasks• Workers asynchronously pick up the firstavailable task on the queue when they are ready

Page 20: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What usefull for ?

• Message Queues are useful in certainsituations• General guidelines: 0->Does your web applications take more than a few seconds to generate a response? o->Are you using a lot of cron jobs to process data in the background? o->Do you wish you could distribute the processing of the data generated by your application amongmany servers?

Page 21: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

What We Need To Make Message Queue ?

Page 22: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

AMQP & RabbitMQ

Page 23: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Why Choice AMQP & RabbitMQ ?

1.RabbitMQ is free to use2.The documentation is decent3.There is decent clustering support, even though we never needed clustering4.We didn’t want to lose queues or messages upon broker crash/ restart5. We develop applications using Python/django andsetting up an AMQP backend using carrot waseasy

Page 24: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Now Lets Talk about RabbitMQ

Page 25: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

RabbitMQ ?

RabbitMQ is Erlang-based open source application that serves as a message broker or message-oriented middleware.

RabbitMQ implementation refers to the application layer protocol that is the Advanced Message Queuing Protocol(AMQP).

AMQP provide an interoperable standard protocol between the vendor to regulate the exchange of messages on enterprise-scale systems.

Page 26: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Why Use RabbitMQ ?

● We need For...● Running Task / Procces in the

backround● Asynchronous tasking process● Scheduling system & Etc

Page 27: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

So .. What make Rabbit Focus ?

Page 28: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Carrot !Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security.

https://github.com/ask/carrot/

Easy way to connect to RabbitMQ.

Easy way to pull stuff out of the queue.

Easy way to throw stuff into the queue.

Page 29: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Concept ?● Publishers (Publishers sends messages to an exchange.)

● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.)

● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.)

● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. )

● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details )

● Exchange types defined by AMQP/0.8:

● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. )

● Fan-out exchange(Always matches, even if the binding does not have a routing key.)

● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)

Page 30: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Creating Connetion on Django

from carrot.messaging import Publisher, Consumerfrom carrot.connection import AMQPConnectionfrom django.conf import settings

conn_for_carrot = AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)

Views.py

RABBITMQ_HOST = 'localhost'RABBITMQ_PORT = 5672RABBITMQ_USER = 'guest'RABBITMQ_PASS = 'guest'RABBITMQ_VHOST = '/'

Settings.py

Page 31: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Publisher

publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':task_id, } })

publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':hashlib.md5(str(task_id)+request.PUT.get('url')).hexdigest(), } })

Page 32: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Consumerdef monitoring_check(): def call(message_data,message): if message_data['msg']['do'] == 'check': print '[+] receiving message' message.ack() task_id = message_data['msg']['task_id'] get_pid = subprocess.Popen(['python','scraper.py', task_id]) pid = get_pid.pid collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING', 'pid':pid}}) print '[Starting PID:%s]'%pid get_pid.wait() else: message.ack() queuename = 'website_monitoring_checker' consumer = Consumer(connection=conn_for_carrot, queue=queuename, exchange='website_monitoring_exchange', exchange_type = 'direct') consumer.register_callback(call) try: print '[queue:%s]consume..' % queuename consumer.wait() except Exception, err: print err

Page 33: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Cooking soup with beautifullsoup?

def visible(element): if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: return False if re.search('<!--', str(element)) or re.search('-->', str(element)) or re.search('&nbsp;', str(element)): return False return True

from BeautifulSoup import BeautifulSoupmonitor = collection_monitor.find_one({'pk':pk})

contents = [collection_content.find({'url':str(monitor['url'])}) [1],collection_content.find({'url':str(monitor['url'])})[0]]

texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True) data = {'content': ' '.join(filter(visible, texts)), 'datetime': i['datetime'], }

Page 34: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Alert by email !

def sending_email(to,sub,msg): try: gmail_user = '[email protected]' gmail_pwd = '***************' smtpserver = smtplib.SMTP("smtp.gmail.com",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = 'To:' + to + '\n' + 'From: Website-Monitoring <'+gmail_user+'>\n' + 'Subject: %s\n'%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err

Page 35: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Task / Scheduling Checking ? task_id = sys.argv[1] print task_id raw_delay = collection_task.find_one({'task_id':task_id})['schedule'] print raw_delay if raw_delay == "1": delay = 60*60 elif raw_delay =="12": delay = 720*60 else: delay = 1440*60 while True: try: print '[+] Starting task: %s' %sys.argv[1] log(task_id, 'INFO', 'starting session') main() except Exception, err: log(task_id, 'exception', err) print err collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:STOPPED]') else: collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay) time.sleep(delay)

Page 36: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs )

https://bitbucket.org/jespern/django-piston/wiki/Home

● Ties into Django's internal mechanisms.

● Supports OAuth out of the box (as well as Basic/Digest or custom auth.)

● Doesn't require tying to models, allowing arbitrary resources.

● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)

● Ships with a convenient reusable library in Python

● Respects and encourages proper use of HTTP (status codes, ...)

● Has built in (optional) form validation (via Django), throttling, etc.

● Supports streaming, with a small memory footprint.

● Stays out of your way.

Page 37: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

How to ?

url(r'^api/', include('api.urls')),

Include on urls.py

Create folder name /api/ on project directory and file.

Include on settings.py

INSTALLED_APPS = ( …....... 'api',

-API/-----handlers.py-----__init__.py-----urls.py

Page 38: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Rest API'S urls.py

from django.conf.urls.defaults import *from piston.resource import Resourcefrom piston.authentication import HttpBasicAuthenticationfrom api.handlers import *

auth = HttpBasicAuthentication(realm="website-monitoring")ad = { 'authentication': auth }

main = Resource(handler=Main, **ad)monitor = Resource(handler=Monitor, **ad)

urlpatterns = patterns('', url(r'^(?P<obj_id>[^/]+)/$', main), url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor),)

Page 39: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Rest API'S handlers.pyfrom piston.handler import BaseHandlerclass Main(BaseHandler): allowed_methods = ('GET') def read(self, request, obj_id): data = collection_user.find_one({'pk': obj_id}) if data: return data data = collection_monitor.find_one({'pk': obj_id}) if data: return data

Page 40: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

class Monitor(BaseHandler): allowed_methods = ('GET', 'PUT', 'DELETE') fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff') def read(self, request, obj_id): try: if obj_id == 'all': data = list(collection_monitor.find({'username': str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({'status':'running'})) …......... except Exception, err: return rc.BAD_REQUEST return data

def update(self, request, obj_id): try: if obj_id == 'create': url_list = [] for i in collection_monitor.find({'username': str(request.user)}): url_list.append(i['url']) if request.PUT.get('url') in url_list: print '[+] Url is exist ' print '[+] Data will be Update ' else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................

Page 41: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

def delete(self, request, obj_id): try: if obj_id == 'all': for i in collection_monitor.find({'username': str(request.user)}): collection_monitor.remove({'username': str(request.user)}) else: if collection_monitor.find_one({'pk': obj_id}): collection_monitor.remove({'pk': obj_id})

except Exception, err: print err return rc.FORBIDDEN else: print 'deleted' return rc.DELETED

Page 42: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Facebook Integration ?

● Just for lazy people ● You don't have to fill the register form just login

in to your facebook then klick – klick & klick .● Good for bussiness marketing● Easy integrate & Etc● Download :● git clone

http://github.com/dickeytk/django_facebook_oauth.git

Page 43: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Question ?

● Twitter :@jimmyromanticdeTwitter :@jimmyromanticde● Facebook:https://www.facebook.com/jimmy.roFacebook:https://www.facebook.com/jimmy.ro

mantic.devilmantic.devil● Email : Email : [email protected]@gmail.com● Bitbucket:Bitbucket:

https://bitbucket.org/jimmyromanticdevil/https://bitbucket.org/jimmyromanticdevil/● Blog : Blog : http://jimmyromanticdevil.wordpress.comhttp://jimmyromanticdevil.wordpress.com

Page 44: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

References

http://www.python.org

https://www.djangoproject.com

http://www.mongodb.org

http://www.rabbitmq.com

http://pypi.python.org/pypi/pymongo

https://github.com/ask/carrot/

https://bitbucket.org/jespern/django-piston/wiki/Home

http://github.com/dickeytk/django_facebook_oauth.git

Life in a Queue “Tareque Hossain” Google “Message Queue”

Page 45: Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

Thank You ! :)