Upload
sebastien-tandel
View
2.275
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Companies in the process of adoption of a language evaluate several aspects like : * performance * integration with existing ecosystem * productivity * use case of this language In this presentation, we'll focus e think about these points sharing the experience of the integration of Python at Terra.
Citation preview
PYTHON IN LARGE COMPANIES?Sébastien Tandel
PLAN
About Terra The 7 steps
Prototype Define the Goals Integration Some Libs Prove It Works Evangelize Next Steps
Conclusions
ABOUT TERRA : WEB PORTAL
Largest Latin American web portal
Located in 18 countries 1000s of servers
Brazil : ~7M unique visitors / day ~70M pageviews / day
ABOUT TERRA
Rank Parent Unique Audience
Active Reach
Time Per Person
1 Google 188,670 0.88 2:37:49
2 Microsoft 170,009 0.79 4:10:36
3 Yahoo! 94,934 0.44 1:13:49
4 Ebay 89,823 0.42 1:51:53
5 Wikimedia Foundation
78,881 0.37 0:12:00
6 Facebook 72,854 0.34 4:09:46
7 AOL LLC 48,722 0.23 1:53:23
8 Amazon 47,819 0.22 0:17:42
9 Apple Computer 47,512 0.22 0:51:03
10 Telefonica / Terra 39,250 0.18 0:41:32Source: Nielsen NetView (June 2009)
ABOUT TERRA : EMAIL PLAFTORM
I’m part of the email team.
Some stats : +10M mailboxes +30M inbound emails per day +30M outbound emails per day
avg : 300 mail/s, peak : 600 mail/s
Systems Main systems : SMTP, LMTP, POP, IMAP, Webmail Total of +30 systems to design/develop/maintain Main languages used C / C++
ABOUT TERRA Several “official” languages at Terra :
PHP, C, C++, Java, C#, Erlang
Average # is one per team!
No official “scripting” language (Python, Perl or other) Why? From what I hear
Performance Integration with others systems (& legacy) Costs / benefits? Buzzword fear Labor market
STEP 1 : PROTOTYPE
STEP 1 : PROTOTYPE
Buggy system re-written as prototype in Python Surprise! Worked a lot better than its C cousin Prototype is now in production!
Spread the word about this rewrite around me Some technical people liked the idea One has not been so enthusiast … my manager
Cons: no integration with homemade systems Just one example
STEP 1 : PROTOTYPE
Introducing new ideas is a long and though way
STEP 2 : DEFINE THE GOALS
Performance critical systems : postfix, lmtp, imap / pop
STEP 2 : DEFINE THE GOALS
Performance critical systems
Web-based systems Webmail, ECP
STEP 2 : DEFINE THE GOALS
Performance critical systems Web-based systems
Backend systems : spamreporter, cleaner, clx trainer, base trainer,
mfsbuilder, migrador, nnentrega, smigol, …
STEP 2 : DEFINE THE GOALS
Performance critical systems Web-based systems Backend systems
Almost inexistent systems (though interesting ones) : Mailboxes stats, logs analysis (stats and user
behavior characterization)
STEP 2 : DEFINE THE GOALS
Performance critical systems Web-based systems Backend systems Stats / User behavior characterization, …
System / Integration tests scripts
STEP 2 : DEFINE THE GOALS
Performance critical systems Web-based systems Backend systems Stats / User behavior characterization, … System / Integration tests scripts
The Grail : Python can be used for ALL except
Performance Critical Systems
STEP 3 : INTEGRATION
STEP 3 : INTEGRATION
Python could be used with every systems
but how can I interface with the homemade systems (legacy) ?
STEP 3 : INTEGRATION
Various way to create Python Bindings :1. Python C API : the “hard” way
STEP 3 : INTEGRATION
Various way to create Python Bindings :1. Python C API : the “hard” way
2. swig : the lazy way won’t create a Pythonic API for you
STEP 3 : INTEGRATION
Various way to create Python Bindings :1. Python C API : the “hard” way2. swig : the lazy way
3. ctypes : the stupidly easy way
from ctypes import cdlll = cdll.LoadLibrary(“libc.so.6”)l.mkdir(“python-mkdir-test”)
STEP 3 : INTEGRATION
Various way to create Python Bindings :1. Python C API : the “hard” way2. swig : the lazy way 3. ctypes : the stupidly easy way
4. Cython : write python, compile with gcc
STEP 3 : INTEGRATION
Wrote bindings to interface with all major internal systems (thanks to ctypes)
With pythonic API!
STEP 3 : INTEGRATION
from trrauth import TrrAuth
auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)
auth.attributes = [ “short_name”, “id_perm”, “antispam” ]
STEP 3 : INTEGRATION
from trrauth import TrrAuth
auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)
auth.attributes = [ “short_name”, “id_perm”, “antispam” ]
print auth.short_name, “:”, auth.id_perm
STEP 3 : INTEGRATION
from trrauth import TrrAuth
auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)
auth.attributes = [ “short_name”, “id_perm”, “antispam” ]
for attr, value in auth: print attr, “:”, value
STEP 4 : SOME LIBS
STEP 4 : SOME LIBSMASTER / SLAVE
Master responsible for : Forking the slaves Reading a “list” of tasks Distribution of the tasks to the slaves
Slave responsible for : Execution of the task Return execution status to the master
Key characteristics : Slave death detection Handle unhandled exceptions (+ hook) Master <-> slave protocol allows temporary error code Timeout of the tasks
STEP 4 : SOME LIBSMASTER / SLAVE
One neat characteristic : System might got bug in prod w/ minimal impact
If unhandled exception occurs Only one slave dies It is detected and master will fork a new one (if
needed) The lib handles the exception :
Default behavior : prints to console User defined (callback) : e.g. write the stack trace to a
file!
Cherry on the cake : getting specific production data about faulty task
STEP 4 : SOME LIBSTCP SOCKETS POOL
Manage connections to a pool of servers
send in a round-robin/priority way to each server
Detect connection errors Retry to connect
Number of retries limited => after mark as dead Retry again later with exponential backoff
STEP 5 : PROVE IT WORKS
STEP 5 : PROVE IT WORKS
Prove = collect data … How?
Write integrated systems using bindings and libs of previous steps.
Show it works
Performance Productivity
STEP 5 : PROVE IT WORKS
Performance, one obvious thought : C/C++
PINCSPerformance is not C, Stupid!
STEP 5 : PROVE IT WORKSPERFORMANCE
Some of the rewrites works faster than C/C++ cousins
Why? OS / Systems limits Libs (legacy) Algorithms Software Architecture Infrastructure
STEP 5 : PROVE IT WORKSPRODUCTIVITY
BTW, pure performance so important?
Time to Market much more important Adopt Lean Thinking and eliminate every possible
waste
Writing too much code is a big waste in several ways1. Loose time when writing2. Increase # bugs3. More time to maintain4. More time to know code base (think to new employees)
Impact Overall Productivity
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Language Level relative to C
C 1
C++ 2.5
Java 2.5
Python 6
Ruby ? Should be ~6
http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Language LOC / FP
C 91
C++ 53
Java 53
Python 21
Ruby ? Should be ~21
http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Language Projects
LOC LOC / Project
C 14438 964.256.944 66786
C++ 11347 444.86.7946 39205
PHP 8559 241.990.111 28273
Java 19803 503.875.784 25444
C# 5439 88.321.294 16238
Ruby 4141 39.449.251 9526
Python 10322 77.720.446 7529
http://www.ohloh.net
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Languages Projects LOC / project
Perl 7679 7318
Scala 140 3072
Visual Basic 1201 4536
Erlang 244 7475
Haskell 545 4955
http://www.ohloh.net
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Language Projects Contributors
C 14438 47310
Java 19803 40897
C++ 11347 34768
Python 10322 21477
PHP 8559 19565
C# 5439 9605
Ruby 4141 7267
http://www.ohloh.net
STEP 5 : PROVE IT WORKSPRODUCTIVITY
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Some existing C/C++ systems re-written in Python
Original C/C++ versions total of ~20.000 LOC
In Python, 4-6x less code !
The previous numbers do not seem to lie
STEP 5 : PROVE IT WORKSPRODUCTIVITY
Oh, parsing an email?
Any idea in C/C++?
STEP 5 : PROVE IT WORKSPRODUCTIVITY
parsing an email from email import message_from_file
fh = open(filename, “r”)
mail = message_from_file(fh)
fh.close()
STEP 5 : PROVE IT WORKSPRODUCTIVITY
parsing an email content types of
parts?
Any idea in C/C++ ?
from email import message_from_file
def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail
mail = get_mail(filename)
STEP 5 : PROVE IT WORKSPRODUCTIVITY
parsing an email content types of
parts
from email import message_from_file
def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail
mail = get_mail(filename)
for part in mail.walk():
print part.get_content_type()
STEP 5 : PROVE IT WORKSPRODUCTIVITY
parsing an email content types of
parts getting headers?
Any idea in C/C++?
from email import message_from_file
def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail
mail = get_mail(filename)
for part in mail.walk():
print part.get_content_type()
STEP 5 : PROVE IT WORKSPRODUCTIVITY
parsing an email content types of
parts getting headers
PYTHON LIBS ARE JUST THAT SIMPLE!… and there are a lot!
from email import message_from_file
def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail
mail = get_mail(filename)
for part in mail.walk():
print part.get_content_type()
print mail[“From”]
print mail[“Subject”]
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
For equivalent architecture (libs, algorithm, infrastructure)
C is a best performer than Python!
Python Is Not C, Stupid!
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
Bottleneck discovered! PINCS! : think first to architecture!
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
Bottleneck discovered! PINCS! : think first to architecture!
Ctypes / Swig : python bindings
Write your bottleneck in C / C++, use it in your python app
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings
Cython : write python, obtain a gcc compiled lib
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings Cython : write python, obtain a gcc compiled lib
Psyco : JIT for python Just an additional module import in your code 2 – 100x times faster than normal Python Requires a bit more memory
STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)
Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings Cython : write python, obtain a gcc compiled lib Psyco : JIT for python
Unladden Swallow : Google Project Produce a version of Python at least 5x faster Every patch goes to Python (no fork!)
STEP 6 : EVANGELIZE
STEP 6 : EVANGELIZE
Once having stopped and look at what have been accomplished …
Show it, Evangelize!
STEP 6 : EVANGELIZE
Because introducing a “new technology” is not just about teaching something to users.
You’ve got to play the role of evangelist!
Innovators (3.5%) New stuffs? they’re in!
STEP 6 : EVANGELIZE
Because introducing a “new technology” is not just about teaching something to users.
You’ve got to play the role of evangelist!
Innovators (3.5%) Early-adopters (12.5%)
Open to new ideas but check before
STEP 6 : EVANGELIZE
Because introducing a “new technology” is not just about teaching something to users.
You’ve got to play the role of evangelist!
Innovators (3.5%) Early-adopters (12.5%) Early majority (35%)
First, they must see the idea working
STEP 6 : EVANGELIZE
Because introducing a “new technology” is not just about teaching something to users.
You’ve got to play the role of evangelist!
Innovators (3.5%) Early-adopters (12.5%) Early majority (35%) Late majority (35%)
Accept after lot of pressure, or imposed
STEP 6 : EVANGELIZE
Because introducing a “new technology” is not just about teaching something to users.
You’ve got to play the role of evangelist!
Innovators (3.5%) Early-adopters (12.5%) Early majority (35%) Late majority (35%) Laggard (14%)
Never accept (why would I want to change?)
STEP 6 : EVANGELIZE
During work, I constantly spoke (a lot) to others
Presentation on Python made for all Present to a large audience what has been done Open discussion
Poster resuming what has been done
Wiki page documenting Python stuffs Specific mailing-list related to Python
STEP 6 : EVANGELIZE
lot of work and slow process but I won some allies
Some technical people are convinced that Python is useful
Some managers are convinced that Python could be a good thing for Terra Starting evaluation in some specific cases
STEP 7 : NEXT STEPS
STEP 7 : NEXT STEPS
Proven that Python could be useful in some cases.
Don’t forget my Grail! The way has not ended …
I’m lobbying to start using Python for web development.
And again, I made a prototype
STEP 7 : NEXT STEPS Django = THE Python MVC web framework :
Model : By describing data, no code written (SQLAlchemy)
Automatic creation of tables (if needed), Data accessed through objects, No SQL needed!
View : access models to get the data render the output through templates
loose coupling interface <-> code!
Controller : REST through url parsing
STEP 7 : NEXT STEPS
Login : Module auth already exists. Easy to tell django that authentication is required
@login_requireddef list_abook(request, username):
…
login_required is a python decorator
STEP 7 : NEXT STEPS
Caching information (memcache, bd, file, …) 4 levels :
Per site : one config line Per view : one python decorator
@cache_page(60 * 15)def list_abook(request, username):
…
In templates : maybe better to let this one out! Low-level cache access :
cache.get(id) cache.set(id, value, timeout)
STEP 7 : NEXT STEPS
Address book Web Service Retrieve address book of one user, Add an account, Add an entry to the address book of a user, View all the address book entries, Output in HTML, JSON and CSV
< 100 LOC 2 hours (w/o knowing the framework) Not one line of SQL just useful code
CONCLUSIONS
One year and a half … and Evangelization is not done yet!
Email Team : Several systems have been written in Python and
works really fine … even with the Terra high load! Web project should start right now
People are starting using/learning it inside the company
Some teams are starting evaluating Python Some Terra employees here at this
conference!
THANKS!Any Questions?