70
PYTHON IN LARGE COMPANIES? Sébastien Tandel [email protected]. br s [email protected]

Python For Large Company?

Embed Size (px)

DESCRIPTION

Companies in the process of adoption of a language evaluate several aspects like : * performance * integration with existing ecosystem * productivity * use case of this language In this presentation, we'll focus e think about these points sharing the experience of the integration of Python at Terra.

Citation preview

Page 2: Python For Large Company?

PLAN

About Terra The 7 steps

Prototype Define the Goals Integration Some Libs Prove It Works Evangelize Next Steps

Conclusions

Page 3: Python For Large Company?

ABOUT TERRA : WEB PORTAL

Largest Latin American web portal

Located in 18 countries 1000s of servers

Brazil : ~7M unique visitors / day ~70M pageviews / day

Page 4: Python For Large Company?

ABOUT TERRA

Rank Parent Unique Audience

Active Reach

Time Per Person

1 Google 188,670 0.88 2:37:49

2 Microsoft 170,009 0.79 4:10:36

3 Yahoo! 94,934 0.44 1:13:49

4 Ebay 89,823 0.42 1:51:53

5 Wikimedia Foundation

78,881 0.37 0:12:00

6 Facebook 72,854 0.34 4:09:46

7 AOL LLC 48,722 0.23 1:53:23

8 Amazon 47,819 0.22 0:17:42

9 Apple Computer 47,512 0.22 0:51:03

10 Telefonica / Terra 39,250 0.18 0:41:32Source: Nielsen NetView (June 2009)

Page 5: Python For Large Company?

ABOUT TERRA : EMAIL PLAFTORM

I’m part of the email team.

Some stats : +10M mailboxes +30M inbound emails per day +30M outbound emails per day

avg : 300 mail/s, peak : 600 mail/s

Systems Main systems : SMTP, LMTP, POP, IMAP, Webmail Total of +30 systems to design/develop/maintain Main languages used C / C++

Page 6: Python For Large Company?

ABOUT TERRA Several “official” languages at Terra :

PHP, C, C++, Java, C#, Erlang

Average # is one per team!

No official “scripting” language (Python, Perl or other) Why? From what I hear

Performance Integration with others systems (& legacy) Costs / benefits? Buzzword fear Labor market

Page 7: Python For Large Company?

STEP 1 : PROTOTYPE

Page 8: Python For Large Company?

STEP 1 : PROTOTYPE

Buggy system re-written as prototype in Python Surprise! Worked a lot better than its C cousin Prototype is now in production!

Spread the word about this rewrite around me Some technical people liked the idea One has not been so enthusiast … my manager

Cons: no integration with homemade systems Just one example

Page 9: Python For Large Company?

STEP 1 : PROTOTYPE

Introducing new ideas is a long and though way

Page 10: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems : postfix, lmtp, imap / pop

Page 11: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems

Web-based systems Webmail, ECP

Page 12: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems Web-based systems

Backend systems : spamreporter, cleaner, clx trainer, base trainer,

mfsbuilder, migrador, nnentrega, smigol, …

Page 13: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems Web-based systems Backend systems

Almost inexistent systems (though interesting ones) : Mailboxes stats, logs analysis (stats and user

behavior characterization)

Page 14: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems Web-based systems Backend systems Stats / User behavior characterization, …

System / Integration tests scripts

Page 15: Python For Large Company?

STEP 2 : DEFINE THE GOALS

Performance critical systems Web-based systems Backend systems Stats / User behavior characterization, … System / Integration tests scripts

The Grail : Python can be used for ALL except

Performance Critical Systems

Page 16: Python For Large Company?

STEP 3 : INTEGRATION

Page 17: Python For Large Company?

STEP 3 : INTEGRATION

Python could be used with every systems

but how can I interface with the homemade systems (legacy) ?

Page 18: Python For Large Company?

STEP 3 : INTEGRATION

Various way to create Python Bindings :1. Python C API : the “hard” way

Page 19: Python For Large Company?

STEP 3 : INTEGRATION

Various way to create Python Bindings :1. Python C API : the “hard” way

2. swig : the lazy way won’t create a Pythonic API for you

Page 20: Python For Large Company?

STEP 3 : INTEGRATION

Various way to create Python Bindings :1. Python C API : the “hard” way2. swig : the lazy way

3. ctypes : the stupidly easy way

from ctypes import cdlll = cdll.LoadLibrary(“libc.so.6”)l.mkdir(“python-mkdir-test”)

Page 21: Python For Large Company?

STEP 3 : INTEGRATION

Various way to create Python Bindings :1. Python C API : the “hard” way2. swig : the lazy way 3. ctypes : the stupidly easy way

4. Cython : write python, compile with gcc

Page 22: Python For Large Company?

STEP 3 : INTEGRATION

Wrote bindings to interface with all major internal systems (thanks to ctypes)

With pythonic API!

Page 23: Python For Large Company?

STEP 3 : INTEGRATION

from trrauth import TrrAuth

auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)

auth.attributes = [ “short_name”, “id_perm”, “antispam” ]

Page 24: Python For Large Company?

STEP 3 : INTEGRATION

from trrauth import TrrAuth

auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)

auth.attributes = [ “short_name”, “id_perm”, “antispam” ]

print auth.short_name, “:”, auth.id_perm

Page 25: Python For Large Company?

STEP 3 : INTEGRATION

from trrauth import TrrAuth

auth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)

auth.attributes = [ “short_name”, “id_perm”, “antispam” ]

for attr, value in auth:  print attr, “:”, value

Page 26: Python For Large Company?

STEP 4 : SOME LIBS

Page 27: Python For Large Company?

STEP 4 : SOME LIBSMASTER / SLAVE

Master responsible for : Forking the slaves Reading a “list” of tasks Distribution of the tasks to the slaves

Slave responsible for : Execution of the task Return execution status to the master

Key characteristics : Slave death detection Handle unhandled exceptions (+ hook) Master <-> slave protocol allows temporary error code Timeout of the tasks

Page 28: Python For Large Company?

STEP 4 : SOME LIBSMASTER / SLAVE

One neat characteristic : System might got bug in prod w/ minimal impact

If unhandled exception occurs Only one slave dies It is detected and master will fork a new one (if

needed) The lib handles the exception :

Default behavior : prints to console User defined (callback) : e.g. write the stack trace to a

file!

Cherry on the cake : getting specific production data about faulty task

Page 29: Python For Large Company?

STEP 4 : SOME LIBSTCP SOCKETS POOL

Manage connections to a pool of servers

send in a round-robin/priority way to each server

Detect connection errors Retry to connect

Number of retries limited => after mark as dead Retry again later with exponential backoff

Page 30: Python For Large Company?

STEP 5 : PROVE IT WORKS

Page 31: Python For Large Company?

STEP 5 : PROVE IT WORKS

Prove = collect data … How?

Write integrated systems using bindings and libs of previous steps.

Show it works

Performance Productivity

Page 32: Python For Large Company?

STEP 5 : PROVE IT WORKS

Performance, one obvious thought : C/C++

PINCSPerformance is not C, Stupid!

Page 33: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE

Some of the rewrites works faster than C/C++ cousins

Why? OS / Systems limits Libs (legacy) Algorithms Software Architecture Infrastructure

Page 34: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

BTW, pure performance so important?

Time to Market much more important Adopt Lean Thinking and eliminate every possible

waste

Writing too much code is a big waste in several ways1. Loose time when writing2. Increase # bugs3. More time to maintain4. More time to know code base (think to new employees)

Impact Overall Productivity

Page 35: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Language Level relative to C

C 1

C++ 2.5

Java 2.5

Python 6

Ruby ? Should be ~6

http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf

Page 36: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Language LOC / FP

C 91

C++ 53

Java 53

Python 21

Ruby ? Should be ~21

http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf

Page 37: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Language Projects

LOC LOC / Project

C 14438 964.256.944 66786

C++ 11347 444.86.7946 39205

PHP 8559 241.990.111 28273

Java 19803 503.875.784 25444

C# 5439 88.321.294 16238

Ruby 4141 39.449.251 9526

Python 10322 77.720.446 7529

http://www.ohloh.net

Page 38: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Languages Projects LOC / project

Perl 7679 7318

Scala 140 3072

Visual Basic 1201 4536

Erlang 244 7475

Haskell 545 4955

http://www.ohloh.net

Page 39: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Language Projects Contributors

C 14438 47310

Java 19803 40897

C++ 11347 34768

Python 10322 21477

PHP 8559 19565

C# 5439 9605

Ruby 4141 7267

http://www.ohloh.net

Page 40: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Page 41: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Some existing C/C++ systems re-written in Python

Original C/C++ versions total of ~20.000 LOC

In Python, 4-6x less code !

The previous numbers do not seem to lie

Page 42: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

Oh, parsing an email?

Any idea in C/C++?

Page 43: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

parsing an email from email import message_from_file

fh = open(filename, “r”)

mail = message_from_file(fh)

fh.close()

Page 44: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

parsing an email content types of

parts?

Any idea in C/C++ ?

from email import message_from_file

def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail

mail = get_mail(filename)

Page 45: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

parsing an email content types of

parts

from email import message_from_file

def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail

mail = get_mail(filename)

for part in mail.walk():

print part.get_content_type()

Page 46: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

parsing an email content types of

parts getting headers?

Any idea in C/C++?

from email import message_from_file

def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail

mail = get_mail(filename)

for part in mail.walk():

print part.get_content_type()

Page 47: Python For Large Company?

STEP 5 : PROVE IT WORKSPRODUCTIVITY

parsing an email content types of

parts getting headers

PYTHON LIBS ARE JUST THAT SIMPLE!… and there are a lot!

from email import message_from_file

def get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mail

mail = get_mail(filename)

for part in mail.walk():

print part.get_content_type()

print mail[“From”]

print mail[“Subject”]

Page 48: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

For equivalent architecture (libs, algorithm, infrastructure)

C is a best performer than Python!

Python Is Not C, Stupid!

Page 49: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

Bottleneck discovered! PINCS! : think first to architecture!

Page 50: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

Bottleneck discovered! PINCS! : think first to architecture!

Ctypes / Swig : python bindings

Write your bottleneck in C / C++, use it in your python app

Page 51: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings

Cython : write python, obtain a gcc compiled lib

Page 52: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings Cython : write python, obtain a gcc compiled lib

Psyco : JIT for python Just an additional module import in your code 2 – 100x times faster than normal Python Requires a bit more memory

Page 53: Python For Large Company?

STEP 5 : PROVE IT WORKSPERFORMANCE (AGAIN?)

Bottleneck discovered! PINCS! : think first to architecture! Ctypes : absurdly easy python bindings Cython : write python, obtain a gcc compiled lib Psyco : JIT for python

Unladden Swallow : Google Project Produce a version of Python at least 5x faster Every patch goes to Python (no fork!)

Page 54: Python For Large Company?

STEP 6 : EVANGELIZE

Page 55: Python For Large Company?

STEP 6 : EVANGELIZE

Once having stopped and look at what have been accomplished …

Show it, Evangelize!

Page 56: Python For Large Company?

STEP 6 : EVANGELIZE

Because introducing a “new technology” is not just about teaching something to users.

You’ve got to play the role of evangelist!

Innovators (3.5%) New stuffs? they’re in!

Page 57: Python For Large Company?

STEP 6 : EVANGELIZE

Because introducing a “new technology” is not just about teaching something to users.

You’ve got to play the role of evangelist!

Innovators (3.5%) Early-adopters (12.5%)

Open to new ideas but check before

Page 58: Python For Large Company?

STEP 6 : EVANGELIZE

Because introducing a “new technology” is not just about teaching something to users.

You’ve got to play the role of evangelist!

Innovators (3.5%) Early-adopters (12.5%) Early majority (35%)

First, they must see the idea working

Page 59: Python For Large Company?

STEP 6 : EVANGELIZE

Because introducing a “new technology” is not just about teaching something to users.

You’ve got to play the role of evangelist!

Innovators (3.5%) Early-adopters (12.5%) Early majority (35%) Late majority (35%)

Accept after lot of pressure, or imposed

Page 60: Python For Large Company?

STEP 6 : EVANGELIZE

Because introducing a “new technology” is not just about teaching something to users.

You’ve got to play the role of evangelist!

Innovators (3.5%) Early-adopters (12.5%) Early majority (35%) Late majority (35%) Laggard (14%)

Never accept (why would I want to change?)

Page 61: Python For Large Company?

STEP 6 : EVANGELIZE

During work, I constantly spoke (a lot) to others

Presentation on Python made for all Present to a large audience what has been done Open discussion

Poster resuming what has been done

Wiki page documenting Python stuffs Specific mailing-list related to Python

Page 62: Python For Large Company?

STEP 6 : EVANGELIZE

lot of work and slow process but I won some allies

Some technical people are convinced that Python is useful

Some managers are convinced that Python could be a good thing for Terra Starting evaluation in some specific cases

Page 63: Python For Large Company?

STEP 7 : NEXT STEPS

Page 64: Python For Large Company?

STEP 7 : NEXT STEPS

Proven that Python could be useful in some cases.

Don’t forget my Grail! The way has not ended …

I’m lobbying to start using Python for web development.

And again, I made a prototype

Page 65: Python For Large Company?

STEP 7 : NEXT STEPS Django = THE Python MVC web framework :

Model : By describing data, no code written (SQLAlchemy)

Automatic creation of tables (if needed), Data accessed through objects, No SQL needed!

View : access models to get the data render the output through templates

loose coupling interface <-> code!

Controller : REST through url parsing

Page 66: Python For Large Company?

STEP 7 : NEXT STEPS

Login : Module auth already exists. Easy to tell django that authentication is required

@login_requireddef list_abook(request, username):

login_required is a python decorator

Page 67: Python For Large Company?

STEP 7 : NEXT STEPS

Caching information (memcache, bd, file, …) 4 levels :

Per site : one config line Per view : one python decorator

@cache_page(60 * 15)def list_abook(request, username):

In templates : maybe better to let this one out! Low-level cache access :

cache.get(id) cache.set(id, value, timeout)

Page 68: Python For Large Company?

STEP 7 : NEXT STEPS

Address book Web Service Retrieve address book of one user, Add an account, Add an entry to the address book of a user, View all the address book entries, Output in HTML, JSON and CSV

< 100 LOC 2 hours (w/o knowing the framework) Not one line of SQL just useful code

Page 69: Python For Large Company?

CONCLUSIONS

One year and a half … and Evangelization is not done yet!

Email Team : Several systems have been written in Python and

works really fine … even with the Terra high load! Web project should start right now

People are starting using/learning it inside the company

Some teams are starting evaluating Python Some Terra employees here at this

conference!

Page 70: Python For Large Company?

THANKS!Any Questions?