http://esr.ibiblio.org/?p=4861 1/44
Armed and Dangerous Sex, software, politi cs, and firearms.
Life's simple pleasures…
63
Python speed optimization in the real world Posted on
2013-03-24 by esr
I shipped reposurgeon 2.29 a few minutes ago. The main improvement
in this version is
speed – it now reads in and analyzes Subversion repositories at a
clip of more than
11,000 commits per minute. This, is, in case you are in any doubt,
ridiculously fast – faster than the native
Subversion tools do it, and for certain far faster than any of the
rival conversion utilities can manage. It’s well
over an order of magnitude faster than when I began seriously
tuning for speed three weeks ago. I’ve learned
some interesting lessons along the way.
The impetus for this tune-up was the Battle for Wesnoth repository.
The project’s senior devs finally decided
to move from Subversion to git recently. I wan’t actively involved
in the decision myself, since I’ve been semi-
retired from Wesnoth for a while, but I supported it and was
naturally the person they turned to to do the
conversion. Doing surgical runs on that repository rubbed my nose
in the fact that code with good enough
performance on a repository 500 or 5000 commits long won’t
necessarily cut it on a repository with over
56000 commits. Two-hour waits for the topological-analysis phase of
each load to finish were kicking my ass
– I decided that some serious optimization effort seemed like
a far better idea than twiddling my thumbs.
First I’ll talk about some things that didn’t work.
pypy , which is alleged to use fancy JIT compilation
techniques to speed up a lot of Python programs, failed
miserably on this one. My pypy runs were 20%-30%
slower than plain Python. The pypy site warns that
pypy’s optimization methods can be defeated by tricky, complex
code, and perhaps that accounts for it;
reposurgeon is nothing if not algorithmically dense.
cython didn’t emulate pypy’s comic pratfall, but didn’t
deliver any speed gains distinguishable from noise
either. I wasn’t very surprised by this; what it can compile is
mainly control structure. which I didn’t expect to
be a substantial component of the runtime c ompared to (for
example) string-bashing during stream-file
parsing.
My grandest (and perhaps nuttiest) plan was to translate the
program into a Lisp dialect with a decent
compiler. Why Lisp? Well…I needed (a) a language with
unlimited-extent types that (b) could be compiled to
machine-code for speed, and (c) minimized the semantic distance
from Python to ease translation (that last
point is why you Haskell and ML fans should refrain from even
drawing breath to ask your obvious question;
instead, go read this ). After some research I found Steel
Bank Common Lisp (SBCL) and began reading up
on what I’d need to do to translate Python to it.
http://esr.ibiblio.org/?p=4861 2/44
The learning process was interesting. Lisp was my second language;
I loved it and was already expert in it by
1980 well before I learned C. But since 1982 the only Lisp programs
I’ve written have been Emacs modes. I’ve
done a whole hell of a lot of those, including some of the most
widely used ones like GDB and VC, but
semantically Emacs Lisp is a sort of living fossil coelacanth from
the 1970s, dynamic scoping and all.
Common Lisp, and more generally the evolution of Lisp
implementations with decent alien type bindings,
passed me by. And by the time Lisp got good enough for standalone
production use in modern environments I
already had Python in hand.
So, for me, reading the SBCL and Common Lisp documentation was a
strange mixture of learning a new
language and returning to very old roots. Yay for lexical scoping!
I recoded about 6% of reposurgeon in SBCL,
then hit a couple of walls. Once of the lesser walls was a missing
feature in Common Lisp corresponding to
the __str__ special method in Python. Lisp types don’t know how to
print themselves, and as it turns out
reposurgeon relies on this capability in various and subtle ways.
Another problem was that I couldn’t easily
see how to duplicate Python’s subprocess-control interface – at
all, let alone portably across common Lisp
implementations.
But the big problem was CLOS, the Common Lisp Object System. I like
most of the rest of Common Lisp
now that I’ve studied it. OK, it’s a bit baroque and heavyweight
and I can see where it’s had a couple of
kitchen sinks pitched in – if I were choosing a language on purely
esthetic grounds I’d prefer Scheme. But I
could get comfortable with it, except for CLOS.
But me no buts about multimethods and the power of generics – I get
that, OK? I see why it was done the
way it was done, but the brute fact remains that CLOS is an ugly
pile of ugly. More to the point in this
particular context, CLOS objects are quite unlike Python objects
(which are in many ways more like CL
defstructs). It was the impedance mismatch between Python and CLOS
objects that really sank my
translation attempt, which I had originally hoped could be done
without seriously messing with the
architecture of the Python code. Alas, that was not to be. Which
refocused me on algorithmic methods of
improving the Python code.
Now I’ll talk about what did work.
What worked, ultimately, was finding operations that have
instruction costs O(n**2) in the number of commits
and squashing them. At this point a shout-out goes to Julien
“FrnchFrgg” Rivaud, a very capable hacker trying
to use reposurgeon for some work on the Blender repository. He got
interested in the speed problem (the
Blender repo is also quite large) and was substantially helpful
with both patches and advice. Working
together, we memoized some expensive operations and eliminated
others, often by incrementally computing
reverse-lookup pointers when linking objects together in order to
avoid having to traverse the entire repository
later on.
Even just finding all the O(n**2) operations isn’t necessarily easy
in a language as terse and high-level as
Python; they can hide in very innocuous-looking code and method
calls. The biggest bad boy in this case
turned out to be child-node computation. Fast import streams
express “is a child of” directly; for obvious
reasons, a repository analysis often has to look at all the
children of a given parent. This operation blows up
quite badly on very large repositories even if you memoize it; the
only way to make it fast is to precompute all
http://esr.ibiblio.org/?p=4861 3/44
120 THOUGHTS ON “PYTHON SPEED OPTIMIZATION IN THE REAL WORLD”
Pingback: Python speed optimization in the real world | dropsafe
Another time sink (the last one to get solved) was
identifying all tags and resets attached to a
particular
commit. The brute-force method (look through all tags for any with
a from member matching the commit’s
mark) is expensive mainly because to look through all tags you have
to look through all the events in the
stream – and that’s expensive when there are 56K of them. Again,
the solution was to give each commit a list
of back-pointers to the tags that reference it and make sure all
the mutation operations update it properly.
It all came good in the end. In the last benchmarking run before I
shipped 2.29 it processed 56424 commits in
303 seconds. That’s 186 commits per second, 11160 per minute.
That’s good enough that I plan to lay off
serious speed-tuning efforts; the gain probably wouldn’t be worth
the increased code complexity.
UPDATE: A week later, after more speed-tuning mainly by Julien
(because it was still slow on the very large
repo he’s working with) analysis speed is up to 282 commits/sec
(16920 per minute) and a curious thing has
occurred. pypy now actually produces an actual speedup, up to
around 338 commits/sec (20280 per minute).
We don’t know why, but apparently the algorithmic optimizations
somehow gave pypy’s JIT better traction.
This is particularly odd because the density of the code actually
increased.
This entry was posted in Software and tagged
reposurgeon by esr . Bookmark the permalink
[http://esr.ibiblio.org/?p=4861] .
on 2013-03-24 at 19:55:07 said:
Has been a pleasure watching hackers at work on irc, and the early
warning for the blog post :-)
– Foo Quuxman
on 2013-03-24 at 20:23:45 said:
Hmmm. I learned a new word today: Memoization. I’ve had few formal
programming classes, and
none recently. I keep up with programming trends by lurking on
various lists–but that often shows me
techniques without naming them. Anyways, I regularly “memoize”
functions but never knew there was
a formal name for it.
http://esr.ibiblio.org/?p=4861 4/44
on 2013-03-24 at 20:51:33 said:
Is it 11k commits per second or per minute? First paragraph says
second, last paragraph says
minute.
esr
on 2013-03-24 at 21:01:30 said:
>Is it 11k commits per second or per minute? First paragraph
says second, last paragraph says
minute.
Typo. I got it right the second time; I’ve fixed the incorrect
first instance.
esr
on 2013-03-24 at 21:07:03 said:
>Anyways, I regularly “memoize” functions but never knew there
was a formal name for it.
Oddly enough, my situation was opposite – I knew the word, but how
to memoize systematically was
something I’d never learned until this last three weeks. I don’t
write code that is both performance-
critical and compute-bound very often, so I haven’t before had
enough use for this technique to nail it
down.
on 2013-03-24 at 21:53:43 said:
Python is faster than a lot of people think it is.
http://esr.ibiblio.org/?p=4861 5/44
You have to figure out how to let most of the looping happen inside
C builtins.
Usually, if it needs to go fast, someone has already made a
library.
Occasionally, I will write C or Pyrex/Cython to speed it up.
But the last time that happened was in 2003…
Joshua Kronengold
on 2013-03-25 at 00:55:29 said:
Nice writing, although I’m somewhat surprised that you haven’t
discovered what I (as someone who
frequently works with “big data” setups) have long since determined
— that when the going gets slow,
it’s time to pull out a profiler and see if some part of your
codebase is running -far- more often than
you’ve anticipated; a sure sign that something upstream of it is
suffering big O problems.
esr
on 2013-03-25 at 01:22:17 said:
>when the going gets slow, it’s time to pull out a profiler and
see if some part of your codebase is
running -far- more often than you’ve anticipated; a sure sign that
something upstream of it is suffering
big O problems.
I’m well aware of the principle. Unfortunately, my experience is
that Python profilers suck rather badly
– you generally end up having to write your own
instrumentation to gather timings, which is what I did
in this case. It helped me find the obscured O(n**2)
operations.
John Wiseman
on 2013-03-25 at 03:11:12 said:
“Once of the lesser walls was a missing feature in Common Lisp
corresponding to the __str__
special method in Python.”
http://esr.ibiblio.org/?p=4861 6/44
on 2013-03-25 at 05:06:07 said:
> you generally end up having to write your own instrumentation
to gather timings, which is what I did
in this case.
Do you deem it good enough to show the rest of the world?
Beat Bolli
on 2013-03-25 at 08:08:31 said:
Looks like a classical runtime/memory trade-off. Have you compared
the working set size before and
after the speedup?
on 2013-03-25 at 08:12:37 said:
>Looks like a classical runtime/memory trade-off. Have you
compared the working set size before
and after the speedup?
It is most certainly that. I didn’t bother measuring the working
set because the only metric of that that
mattered to me was “doesn’t trigger noticeable swapping”.
esr
on 2013-03-25 at 08:13:26 said:
>Do you deem it good enough to show the rest of the world?
Look at the implementation of the “timings” command.
http://esr.ibiblio.org/?p=4861 7/44
>You want print-object:
http://www.lispworks.com/documentation/HyperSpec/Body/f_pr_obj.htm
“The function print-object is called by the Lisp printer; it should
not be called by the user.”
Anyway, this looks like an analogue of Python repr(), not
print – it’s supposed to print a
representation that’s invertible (can be fed back to read-eval). I
use str() for dumping the fast-import
stream representations of objects, which is not invertible by
Python itself.
JustSaying
on 2013-03-25 at 08:21:47 said:
Big O optimization trumps (or at worst equals in lucky cases) any
compiler-aware information,
because the degrees-of-freedom in the semantics not modeled by the
language (and the declared
types) is always a superset. Yet another reason why computers will
never program themselves
creatively and why I think the Singularity is nonsense.
I don’t know enough about the details of CLOS nor defstructs to
grasp the detailed reasons for the
claimed impedance mismatch between the CLOS and Python.
Programmers are rightfully proud when they achieve an
order-of-magnitude gain in performance. I
don’t see programmers run away from their babies and disappear into
thin air without ever bragging to
any one of their accomplishment. How lonely that would be
otherwise.
Nancy Lebovitz
on 2013-03-25 at 08:49:17 said:
Checking to make sure I understand: Memoization is looking up and
recording the data you’re likely
to keep needing instead of looking it up every time you need
it?
esr
http://esr.ibiblio.org/?p=4861 8/44
>Checking to make sure I understand: Memoization is looking up
and recording the data you’re likely
to keep needing instead of looking it up every time you need
it?
Correct. It works when the results of an expensive function (a)
change slowly, and (b) are small and
cheap to store. Also there has to be a way to know when the cached
results have become invalid so
you can clear the cache.
Since you’re not a programmer, I’ll add that big-O notation is a
way of talking about how your
computation costs scale up with the size of your input data. O(1)
is constant time, O(n) is linear in
the size of the input set, O(n**2) is as the square of the size,
O(2**n) as the number of subsets of the
data set. Also you’ll see O(log n), typically associated with the
cost of finding a specified item in a
tree or hash table. And O(n log n) which is the expected cost
function of various good sorting
algorithms. In general, O(1) < O(log n) < O(n) < O(n log
n) < O(n**2) < O(2**n). Normally anything
O(n log n) or below is tolerable, O(n**2) is pretty bad, and
O(2**n) is unusably slow.
Rick C
on 2013-03-25 at 09:15:11 said:
Nancy, it would be more accurate to say you record the results of
complex calculations and then
reuse the stored result later, rather than recalculate it every
time.
Shenpen
on 2013-03-25 at 10:08:19 said:
> Once of the lesser walls was a missing feature in Common Lisp
corresponding to the __str__
special method in Python. Lisp types don’t know how to print
themselves, and as it turns out
reposurgeon relies on this capability in various and subtle
ways.
Does it also rely on everybody using this and doing it in a
sensible, readable way in their classes.
Also, have you checked Jython?
iajrz
http://esr.ibiblio.org/?p=4861 9/44
Rick: all calculations are functions, aren’t they? But if you had
to do a look-up which requires
expensive/extensive/recurrent parsing, can that be called a
calculation?
It is still good for memoization…
The Monster
on 2013-03-25 at 10:29:15 said:
I’m a big believer that a data structure with one-way pointers is
vastly inferior to one that includes
back-pointers. With back-pointers, you can always traverse the
structure in any direction. Without
them, you have to do searches, which are always expensive, and
progressively more expensive as
the structure grows.
I, too, was unfamiliar with the verb “memoize”, but have made use
of the idea behind it many times.
At my last job, I wrote some utility programs that had to
know where to find some files that weren’t
stored in well-known locations (but were very unlikely to move once
they’d been put in a given place,
because that was a PITA). Since a find is a very expensive
operation, I made the utility installer
dispatch an at now job to do the find once and cache
the result in a specific location that the other
utilities knew about.
on 2013-03-25 at 10:44:42 said:
>Does it also rely on everybody using this and doing it in a
sensible, readable way in their classes.
My code doesn’t assume that every class in the universe has a
sensible __str__, but it does assume
that almost every class defined in reposurgeon has its own __str__
that is useful for
progress/debugging messages, and (this is the key point) the system
str() will recurse down through
all such methods when told to print an arbitrary structure.
>Also, have you checked Jython?
No. Is there any reason I should expect it to be faster than
c-python? I thought it was mainly aimed
at allowing programmers to use the Java library classes, rather
than at performance per se.
Mike E
http://esr.ibiblio.org/?p=4861 10/44
on 2013-03-25 at 11:17:53 said:
“Memoization”; good to know the name for that. I found myself doing
that extensively while trying to
work through the problems at Project Euler (projecteuler.net),
which is a marvelous resource with a
series of incrementally more difficult mathematical programming
puzzles for those interested in such
a thing.
on 2013-03-25 at 11:41:19 said:
> the system str() will recurse down through all such methods
when told to print an arbitrary
structure.
I’m not sure what this means. The closest thing I can think of is
the fact that system structure types
(such as list/tuple/dict) will call str (or, looks like it’s
actually repr at least half the time) on their
children.
That’s a kind of narrow definition of “arbitrary structure” for my
taste.
esr
on 2013-03-25 at 12:01:56 said:
>That’s a kind of narrow definition of “arbitrary structure” for
my taste.
Perhaps I was unclear. I created __str__ methods for all the
classes that are parts of Repository –
the effect is that when requesting structure dumps for debugging
instrumentation I can just say str()
on whatever implicit object pointere I have and the intuitively
useful thing will happen. I don’t know
how to duplicate this effect in CL. What it would probably require
is for the system print function to
magically call a str generic whenever it reaches a CLOS
object.
Patrick Maupin
@The Monster:
http://esr.ibiblio.org/?p=4861 11/44
I, too, was unfamiliar with the v erb “memoize”, but have
made use of the idea behind it many
times. At my last job, I w rote some utility programs that had to
know where to find some files
that weren’t stored in we ll-known locations (but were very
unlikely to move once they’d
been put in a given place , because that w as a PITA). Since a find
is a very expensive
operation, I made the utility installer dispatch an at now job to
do the find once and cache
the result in a spec ific location that the other utilities knew
about.
Congratulations! You reinvented bash’s command hash. :-)
But seriously, this is a great idea, and like most great ideas,
will have multiple independent
inventions by multiple clever people.
Garrett
on 2013-03-25 at 12:08:56 said:
I would just follow up on esr’s excellent overview of big-O
notation above with one point which is often
missed by developers. The impact of the algorithm is usually seen
as data sets grow larger. For
small data sets, the complexity of the operation frequently is
overtaken by other concerns.
To provide a mundane example: a car goes much faster than you can
walk, but if you are a city-
dweller it’s probably faster to walk to your neighbour’s house than
to drive.
Adam
on 2013-03-25 at 12:23:26 said:
> O(1) < O(log n) < O(n) < O(n log n) < O(n**2) <
O(2**n)
While that's theoretically true, It's interesting to note that in
practice, O(1) = O(log n). For typical
problems, you should just mentally macro expand "log2 n" to 30. The
only way you're going to get it
different enough from 30 to make any difference is to have n be so
small that the operation in
question is effectively instant. For example, to shave a mere 1/3
from that "constant" requires n to
decrease by three orders of magnitude.
Maybe you want to work on an atypical problem. For the biggest
problem most people could possibly
attempt, (log2 n) < 60. For Google, it might be 70. For the
crackpot who wants to count every
http://esr.ibiblio.org/?p=4861 12/44
on 2013-03-25 at 12:23:51 said:
Well, looks like the Boston Lisp folks called it. At their last
meeting a couple of weeks ago, some of
them predicted that you would:
a) discover that your speed problem is better solved by algorithmic
optimization than by switching to
a faster language or compiler;
b) write a post critiquing the shortcomings of Common Lisp.
They were pretty spot on except they thought you would critique
CL’s lack of libraries, not the
ugliness of CLOS. :)
>esr’s excellent overview of big-O notation
Entertainingly, one of the downsides of being an entirely
self-taught programmer is that I didn’t learn
big-O notation or the associated reflexes until relatively late in
my career. It wasn’t intuitive for me
until, oh, probably less than five years ago.
esr
on 2013-03-25 at 12:53:53 said:
>They were pretty spot on except they thought you would critique
CL’s lack of libraries, not the
ugliness of CLOS. :)
And I might have gotten to that if I’d gotten around
CLOS.
John Wiseman
http://esr.ibiblio.org/?p=4861 13/44
>“The function print-object is called by the Lisp printer; it
should not be called by the user.”
Correct. You define a custom print-object method on your data
types, and it is called by the
implementation whenever you cause a value of that type to be
printed–by calling print, prin1, format,
or whatever. Just like you don’t explicitly call __str__.
> Anyway, this looks like an analogue of Python repr(), not
print – it’s supposed to print a
representation that’s invertible (can be fed back to
read-eval).
It is used for both, actually. If *print-readably* is T, then it
must either print an readable (“invertible”)
representation or throw an error–”repr mode”. Otherwise, it can
print whatever it wants–”str mode”.
John Wiseman
Lispers usually use the print-unreadable-object helper macro.
See
http://clhs.lisp.se/Body/m_pr_unr.htm for an example.
on 2013-03-25 at 14:46:32 said:
> Once of the lesser walls was a missing feature in Common Lisp
corresponding to the >__str__
special method in Python. Lisp types don’t know how to print
themselves, and as >it turns out
reposurgeon relies on this capability in various and subtle ways.
Another >problem was that I couldn’t
easily see how to duplicate Python’s subprocess-control
>interface
what about CL’s much hyped ability to have new features added very
easily (I’m thinking of Paul
Graham’s writings): adding a macro would not have solved your
problems? not worth your time? too
tricky?
Faré
on 2013-03-25 at 15:05:02 said:
http://esr.ibiblio.org/?p=4861 14/44
case, there is no perfect answer, but EXECUTOR does a decent job on
the major implementations
(SBCL, CCL and a few more).
CLOS is ugly but (1) it’s more expressive and powerful than any
other object system I’ve heard of
(e.g. multiple inheritance, multiple-dispatch, method combinations,
accessors, meta-object protocol,
etc.), and (2) you can hide the ugly behind suitable macros, and
many people have.
Regarding __str__ and print-method, see John Wiseman’s answer;
though in this case you might
want to define your own serialize-object method and have a mixin
defining a print-object method that
wraps a call to that in a print-unreadable-object.
esr
on 2013-03-25 at 15:23:12 said:
>what about CL’s much hyped ability to have new features added
very easily (I’m thinking of Paul
Graham’s writings): adding a macro would not have solved your
problems? not worth your time? too
tricky?
Dunno. Would have looked into it more deeply, but CLOS blocked the
translation. Now that I know
SBCL exists, though, I’ll probably do a project in it from scratch
sometime and learn these things.
esr
on 2013-03-25 at 15:25:08 said:
>though in this case you might want to define your own
serialize-object method and have a mixin
defining a print-object method that wraps a call to that in a
print-unreadable-object.
Yes, I thought the answer would be something much like that.
Good to know that UIOP:RUN-PROGRAM exists – next time I try
something like this I’ll look it up.
dtsund
on 2013-03-25 at 15:52:00 said:
http://esr.ibiblio.org/?p=4861 15/44
big-O notation or the associated reflexes until relatively late in
my career. It wasn’t intuitive for me
until, oh, probably less than five years ago.
Weren’t you also a mathematician, at least briefly? My first
exposure to the notation was in Real
Analysis, after which grasping it in a CS context was almost
trivial.
Jay Maynard
on 2013-03-25 at 15:52:23 said:
Let’s go up a metalevel. I was mildly surprised you considered
switching languages at all before
attacking the algorithms’ speed issues. This seems unlike you. How
did you get there?
Jeff Read
on 2013-03-25 at 16:02:30 said:
CLOS is ugly but (1 ) it’s more expressive and pow erful than any
other objec t system I’v e
heard of (e.g. multiple inheritance, multiple-dispatch, method c
ombinations, acc essors,
meta-objec t protocol, etc.), and (2) you can hide the ugly behind
suitable macros, and many
people have.
Historically there was T’s object system: as powerful as CLOS but
actually beautiful.
The closest I can find in a modern running Scheme is RScheme’s
object system, but RScheme has
sadly been lacking in maintenance or interest and is still quite
riddled with bugs.
esr
on 2013-03-25 at 16:06:53 said:
>Let’s go up a metalevel. I was mildly surprised you considered
switching languages at all before
attacking the algorithms’ speed issues. This seems unlike you. How
did you get there?
You’re right – it was unlike me (on the evidence anyone else
has available). I’ve actually been
wondering if anyone would notice this and bring it up.
http://esr.ibiblio.org/?p=4861 16/44
out the stuff that could be attacked that way. In my defense, I
will note that the remaining O(n**2)
code was pretty well obscured; it took a couple of weeks of
concentrated attention by two able
hackers to find it, and that was after I’d built the machinery for
gathering timings.
esr
>Weren’t you also a mathematician, at least briefly?
I was, but my concentration was in abstract algebra, logic, and
finite mathematics. I didn’t actually
learn a lot of real analysis (I had a fondness for topology that
was unrelated to my main interests, but
I approached it through set and group theory rather than
differential geometry). It may also be that
big-O notation wasn’t as prominent then (in the 1970s) as it later
became, so I’d have been less likely
to encounter it even if I had been learning more on the continuous
side.
BRM aka Brian R. Marshall
on 2013-03-25 at 16:37:46 said:
Another note for non-programmers…
A “profiler” is a tool to determine how much time is spent
running different parts of a program. As
ESR noted, sometimes it is better to add some code to the program
to get the required results.
(Such code generally isn’t used/run when not trying to speed up the
program.)
Sometimes, at least as a first try, a programmer can tell from the
code where it is worth trying to
speed things up.
In any case, this kind of analysis is very useful. A junior/lousy
programmer may attempt to speed up
a program by reworking code that obviously can be made to run
faster. But if a program takes 10
minutes to run and this code accounts for only 10 seconds of that
time, it is a waste of time trying to
speed it up. Even if it can be made to run 10 times faster, the
program run time goes from 600
(590+10) seconds to 591 (590+1) seconds.
Sometimes this kind of improvement is worse than a waste of time.
The code may be written in a
way that makes it obvious what it is supposed to do and that it is,
in fact, doing it. Reworking code
http://esr.ibiblio.org/?p=4861 17/44
on 2013-03-25 at 17:11:19 said:
> Also you’ll see O(log n), typically associated with the cost
of finding a specified item in a tree or
hash table.
Small correction: hash table insertion and lookup are expected
O(1), not O(lg n).
(At worst, they are O(n), but this degenerate case hardly ever
happens unless you piss off a
cryptographer. )
JustSaying
@Adam:
It ’s interesting to note that in prac tice, O(1) = O(log n).
For typical problems, yo u should just
mentally macro expand “log2 n” to 3 0. The only way you’re going to
get it different enough
from 30 to make any difference is to have n b e so small that
the operation in question is
effectively instant. For example, to shav e a mere 1 /3 from that
“co nstant” requires n to
decrease by three orders of magnitude.
Why are you claiming that 300% efficiency increase is irrelevant
(equivalent to constant) in the
presence of iterations that range over 3 orders-of-magnitude?
log n is still log n, not constant.
Are you claiming that no such cases occur?
JustSaying
@esr:
Y ou’re right – it was unlike me (on the ev idence anyo ne else has
available). I’ve actually
http://esr.ibiblio.org/?p=4861 18/44
I had assumed that you wanted to test out whether there was a
fundamental advantage of your long-
lost love over your new one. I have observed that you favor
continuity of code bases over other
considerations, so I should have realized I was wrong. Perhaps I
was distracted.
Jay Maynard
on 2013-03-26 at 03:37:13 said:
JustSaying: I think the point is that going from 30 to 1 is almost
never enough improvement to be
worth doing, and it’s effectively linear (when you have a 2**30
scale factor on input producing scale
factor of 30 on output, there are much bigger fish to fry).
Jay Maynard
on 2013-03-26 at 03:47:33 said:
>I will note that the remaining O(n**2) code was pretty well
obscured; it took a couple of weeks of
concentrated attention by two able hackers to find it
Are there any O(n**2) traps within Python itself we can avoid
that you found, or was this all your
algorithms’ fault?
another user
Did you consider profiling reposurgeon for performance bottlenecks,
rewriting the relevant pieces of
code in C/C++ and using bindings? I personally like
boost-python.
Maybe if you keep 90% of code written in Python and rewrite 10% of
performance-critical code in C,
you can approach the speed of a C program.
JustSaying
http://esr.ibiblio.org/?p=4861 19/44
almost never
That is why I asked if there are no cases. I can’t think of a case
at the moment, but I am skeptical of
saying there are none. I think log n is still log n and I should
remember it as that, while also factoring
in that it might nearly always be too low of a priority. All of us
experienced programmers, I am sure
share BRM’s experience that obfuscating code for insignificant
efficiency gains is myopic.
Winter
@JustSaying
” I think log n is still log n and I should remember it as that,
while also factoring in that it might nearly
always be too low of a priority.”
O(log n) vs O(n) corresponds to [c1 * log(n) + d1] N. In practice
the constants may be so large that n
> N is out of your reach.
So, your implementation might indeed scale as O(log n), but it
could still run much slower for
practical n.
Sorry, html filter messed up my comment:
@JustSaying
” I think log n is still log n and I should remember it as that,
while also factoring in that it might nearly
always be too low of a priority.”
O(log n) vs O(n) corresponds to [c1 * log(n) + d1] LT [c2 * n + d2]
for some n GT N. In practice the
constants may be so large that n GT N is out of your reach.
So, your implementation might indeed scale as O(log n), but it
could still run much slower for
practical n.
http://esr.ibiblio.org/?p=4861 20/44
on 2013-03-26 at 11:12:22 said:
Of course lg n isn’t *really* a constant, but it’s often useful to
think of it that way. It’s also useful at
times to assume a spherical cow.
They say that premature optimization is the root of all evil. If
you need to sort the items in a dropdown
box, you’re probably fine to use an n^2 sort. Those are fast and
easy to code up, which means fewer
bugs. It’s a dropdown box, so your user experience will be crap if
you have more that a few dozen
items anyway. At most, you’ll add a few milliseconds, which isn’t
noticeable. When “n” is small
enough, even the difference between O(n) and O(n^2) doesn’t matter.
An extra lg n is completely
irrelevant. That’s one of Knuth’s “small efficiencies”.
However, some optimizations aren’t premature. If you have lists of
a billion items, n^2 sorts are out of
the question. Lets say you’re typically sorting a billion items.
Then lg n is 30. Assume that once in a
while, you need to sort 10 billion items. Then lg n is a hair over
33. That adds 11% to your runtime.
Instead of spending 10x longer processing 10x more items, you’ll
have to spend 11x longer. The
difference is negligible: An order of magnitude change of the input
size in either direction affects your
total runtime over by only 11% over an O(n) algorithm. Given a
gigantic three orders of magnitude
change of input size, the lg n factor results in only 66%. That’s
not nothing, but it’s also not the real
problem. You’ll need more memory before you’ll need more CPU.
In short, when “lg n” really varies, “n” is small enough that the
entire operation doesn’t matter. When
“n” is large enough to matter, “lg n” varies so little that the
variation doesn’t matter. Not much
anyway.
You could improve your spherical cow model by making it an oblate
spheroid, adding another smaller
one as a head, and adding four cylindrical legs — but that won’t
change the air resistance enough to
stop the cow from making a big mess when it hits the ground.
esr
on 2013-03-26 at 11:14:30 said:
>Are there any O(n**2) traps within Python itself we can avoid
that you found, or was this all your
algorithms’ fault?
I don’t know that yet. It was probably all my code, but there could
be O(n**2) traps within Python as
well.
http://esr.ibiblio.org/?p=4861 21/44
>boost-python
*twitch*
Merciful $DEITY. Boost is bad enough. Don’t inflict it on
Python.
esr
>Did you consider profiling reposurgeon for performance
bottlenecks, rewriting the relevant pieces of
code in C/C++ and using bindings?
Yes, for about a half-second. Then I realized how ridiculous the
idea was and abandoned it.
That strategy only works when the stuff you need to do fast fits in
C’s type ontology without incurring
so much code complexity that you end up with more problems than you
started with. There was no
chance that would be true of reposurgeon’s internals – none at
all.
Garrett
@JustSaying:
I work with filesystems for a living. When you have a large on-disk
data structure you need to search,
loading another block off of disk is a big cost. OTOH, searching
that block in memory is
comparatively cheap. For some of our data structures we use binary
or hash trees to locate the block
we need, but then pack the block as an array. This avoids extra
pointers and allows us to cram a few
more entries per block. In these cases, cutting the number of block
loads from 20 to 10 can be a big
savings if the operation must occur in real-time for a client (as
opposed to a background processing
operation). Spinning rust is slow …
http://esr.ibiblio.org/?p=4861 22/44
@esr:
O(1) is co nstant time [...] O(log n), typically assoc iated with
the c ost o f finding a spec ified
item in a tree or hash table.
O(1) is typically associated with the cost of accessing a specified
item in an array by index.
@Winter: O(log n) vs O(n) corresponds to [c1 * log(n) + d1] LT [c2
* n + d2]
@Adam:
In short, w he n “lg n” really varies, “n” is small e nough
that the entire operation do esn’t
matter.
That is only if c1 is small relative to d1 and the
“universe”.
The curve for log(n) flattens faster than even sqrt.
The sacrosanct rule to not do premature optimization appears to be
deprecated under open-
extension, because profiling isn’t available.
If your caller must to call you a billion times (perhaps
deep in some nested function hierarchy), and
you are employing a log(n) tree or hash instead of an array, then
the difference in application
performance can 300% n = 1000, 400% n = 10,000, 500% n = 100,000,
etc.
So log(n) is never the same as constant. The cow is never spherical
except when we “touch him only
one way” — Steve Jobs.
on 2013-03-26 at 12:39:34 said:
@esr: “…I didn’t bother measuring the working set because the only
metric of that that mattered to
me was “doesn’t trigger noticeable swapping”.”
Sure, for your current number commits. Now, using caching, the
design trades off runtime for an
upper limit based on the memory of the box.
http://esr.ibiblio.org/?p=4861 23/44
on 2013-03-26 at 13:41:41 said:
>Now, using caching, the design trades off runtime for an upper
limit based on the memory of the
box.
Indeed so. It’s easier to buy memory than more processor speed
these days.
Winter
@JustSaying
The constants for O(log n) tend to be larger than for O(n), else
you would have tried the log n
algorithm first. And indeed, log n matters if n is in the billions.
But at that point, you are tweaking all
algorithms.
on 2013-03-26 at 18:26:07 said:
> Looks like a classical runtime/memory trade-off. Have you
compared the working set size before
and after the speedup?
TL,DR: see below
In fact, I should tell that before I worked on refactoring for
speed, I began searching for ways to cut a
lot the memory used by reposurgeon. Most of the gain was obtained
by using __slots__ on most
instanciated structures, but I did some dict eviction and copy on
write optimization on a really
memory hungry part: the filemaps.
Reposurgeon already was optimized in that regard (Eric had already
implemented a rather good
COW scheme for PathMaps), but the fact that PathMap’s snapshotting
required a new dictionnary
each time — to be able to replace later an entry by its copy — was
taking its toll… So I devised a
tweek to take snapshots even less often, then a further
optimization which is a real memory
usage/code complexity tradeoff.
http://esr.ibiblio.org/?p=4861 24/44
Returning to simpler structures would probably gain some speed too,
but the fact is that on my
machine, reposurgeon still tops at 75% of my 4GB of RAM when
converting the blender repository —
and I suspect Battle for Westnoth to be such a contender too. Sure,
one can trade computationnal
cost and even code readability for memory, but the bargain is not
the same when you can trade
200MB temporary memory for a O(n**2) to expected O(n) reduction —
e.g. store previous hits in a
set/dict instead of searching them backwards in the “already seen”
list — than when you trade 2GB
of memory used through the whole import for only a constant factor
— one of the costs of the smart
COW PathMap over a list of dicts is that built-in types don’t have
interpreter overhead, and in fact run
at C-speed, but that’s only a constant factor rather that a whole
new complexity class.
As for the optimization itself, it is amusing to note that
Eric and I actually started optimizing for
speed each in his corner without concerting… At first we were doing
orthogonal changes, then as the
set of molasses reduced we began stepping on each other’s
toes^W^W^W^W^W collaborating more
;-) Also note that while Eric says to have driven his optimizations
by profiling, I was less smart and
just wandered in the code searching for unpythonic or
unpretty code to my eyes — to the risk of
premature or over- optimization. I was more seeking refactors for
clarity and code compactess — and
iterators galore because I love them too much for my own good —
than real speed optimizations; it
just happens that I seem to find ugly O(n**2) code.
The last thing that can explain why Eric didn’t find the places to
optimize at first sight is that
reposurgeon is big and its internal structures have been made to
mirror the fast-import format at first.
This legacy still shows a lot. While that decision was sane at the
time when reposurgeon was less
complex and able than now, and while there still are several
tangible benefits to this similarity — like
the ability of reposurgeon to round-trip fast-import streams to the
exact character, over the course of
time — and especially in the few last weeks where Eric and I
started to optimize — internal objects
like Commits track more and more their relationships with their
surrondings, to the point that now
they collectively maintain in memory the whole DAG, in both
directions.
At first, Commits only stored the marks to their parents. To
find parents a sweep over the complete
set of events was needed, because a mark is only a string
containing a colon and a number, and
marks aren’t even necessarily consecutive… Eric made that
computation to remember its results,
then swapped altogether to storing the commits objects instead,
diverging from fast-import towards a
graph representation. For children, I first memoized the function
searching for all commits whose
parents contained self, then replaced that altogether by code that
stores the children list on commits
but keeps them synchronized at all times with parent lists. And for
tags/resets, Eric and I both tried
to make commits know which tags/resets pointed to them, always kept
in sync with the information
on tags telling where they point to.
TL,DR: Some of the innefficiencies were hidden, but most of them
were due to the lack of
informations stored. Some loops that were only O(n) were actually
called O(n) times by another
function — which in a codebase that dense is not easy to spot — and
it was not possible to make
tho inner loop more efficient short of doing large refactors… All
these problems combined tend to
make a poor human’s brain automatically sweep over and search some
other more palatable
optimization. The needed refactors were difficult to do, not
because the end result isn’t known but
http://esr.ibiblio.org/?p=4861 25/44
everywhere.
Keeping commits very small and ensuring each state was correct was
an imperative goal for me.
Kudos for Eric and his approach to writing code, documentation, and
test suites at the same time, or
else none of these refactorings could have happened for fear of
breaking everything… And I broke a
lot of things… but noticed right away. Some parts of the code were
actually relying on some
invariants that came from the fact that parent and children lists
were generated at first ! Finding those
was hard and a blocker for the refactorings.
I already said far too much for a small comment, sorry for that
:-(
Sigivald
on 2013-03-26 at 18:38:48 said:
BRM said: Sometimes this k ind of improvement is worse than a waste
of t ime. The code may be
written in a way that makes it obvious what it is supposed to do
and that it is, in fact, doing it.
Reworking code that makes unimportant improvements but also makes
the code obscure and subtle
is bad practice.
“Premature optimization” is a related problem.
First, see if it’s slow.
Then, see what part of it’s actually making it slow,
Then fix that part.
(And if, as in the quote above, the speed improvement is minor
compared to the added complexity,
don’t fix it .)
on 2013-03-26 at 18:53:08 said:
>(Eric had already implemented a rather good COW scheme for
PathMaps)
http://esr.ibiblio.org/?p=4861 26/44
memory footprint, and in so doing enabled me to solved a fiendishly
subtle bug in branch processing
that had stalled the completion of the Subversion reader for six
months. To invoke it, the repository
had to contain a Subversion branch creation, followed by a
deletion, followed by a move of another
branch to the deleted name.
I still don’t know what exactly was wrong with my original
implementation, but a small generalization
of Hudson’s code (from CoW filepath sets to CoW filepath maps)
enabled me to use it to remove a
particular O(n**2) ancestry computation in which I suspected the
bug was lurking. Happily that
suspicion proved correct.
on 2013-03-26 at 20:48:55 said:
By the way, Eric, what profiler did you try to use, and what you
are missing in it? What features
would you like to see in profiler?
esr
on 2013-03-26 at 21:47:41 said:
>By the way, Eric, what profiler did you try to use, and what
you are missing in it? What features
would you like to see in profiler?
The stock Python profiler. Unfortunately, it’s pretty bad about
assigning time to method calls! I’ve
always thought this was odd given that the standard style is so
OO.
Faré
on 2013-03-26 at 23:43:09 said:
BTW, SBCL has SB-SPROF for profiling, which is quite informative,
though it is not obvious at first
how to read the results.
Faré
http://esr.ibiblio.org/?p=4861 27/44
on 2013-03-26 at 23:55:19 said:
(Also, if you do complex shell pipes or string substitutions,
INFERIOR-SHELL:RUN is a richer front-
end on top of UIOP:RUN-PROGRAM. A implementation of it on top of
EXECUTOR:RUN-PROGRAM
or IOLIB:SPAWN would be nice, but hasn’t been done yet.)
Shenpen
on 2013-03-27 at 05:22:59 said:
>Entertainingly, one of the downsides of being an entirely
self-taught programmer
Actually I think the standard schoolish way of learning
theory, then hands-on experience, then more
work experience, is not useful at all in the two fields I was
taught, programming/database design and
business administration. We memorize and barf back theoretical
definitions which we don’t care
about because we have no idea what they are good for, and are often
too formal to seem really
useful, take an exam, forget them, and later on it is hard to apply
it to practical problems, or even
realize that the problems we face have anything to do with
them.
It would be better to do hands-on practice first, try to figure out
solutions, usually fail, then being told
to do it X way without an explanation, and then learn the theory
why we were told so.
Example: I remember memorizing, not really understanding, taking an
exam of, and then promptly
forgetting database normalization: 3NF, BCNF, 4NF. Then years later
actually designing databases,
figuring out a common sense way of doing it, then realizing this
actually sounds something like
BCNF. Then I went back to the textbook, looked up 4NF and actually
my design got better. And then
– realizing it is all too slow and we have to denormalize for
speed :-)
Same with business administration, only after many years of work I
got the philosophy of accounting
and going back to the textbook they started to make sense.
What would be a world like in which every construction engineer
would first work as a mason and
carpenter?
on 2013-03-27 at 05:59:09 said:
It would be nice if you told us more about your dissatisfaction
with CLOS. I can think of lack of dot
http://esr.ibiblio.org/?p=4861 28/44
and for situations when it does there are with-slots and
with-accessors. Maybe decorators for
classes/generic functions/methods would not harm, too (macrology
and MOP helps, but in some
situations I would indeed prefer decorators as they’re easier to
combine and using MOP may switch
off compiler optimizations for CLOS). Other than that, is the
problem reduced to the fact that CLOS is
unlike Python’s object model? If so, I’m not sure whether it’s a
problem of CLOS or one of Python. I
for one often miss CLOS when I write Python or JavaScript code.
Besides multimethods/MOP/etc.
there are other good sides to it, for instance using (func object)
instead of object.method notation
makes completion and call tips work much better; also, CLOS is very
well suited for “live editing”,
when you make modifications without restarting your program –
that’s usually hard to achieve in JS
and very hard in Python.
esr
on 2013-03-27 at 08:39:11 said:
>It would be nice if you told us more about your dissatisfaction
with CLOS.
Dot notation would have been nice, but that’s just syntax and
un-Lispy (though you should look into
e7 if any documentation for it is still on the web). I think the
“feature” that stuck most in my craw was
having to declare stub generics on penalty of a style warning from
the compiler. Bletch! I dislike the
requirement that all methods be globally exposed. too.
For this particular translation, I wanted a class system that s
imulated Python behavior more closely.
I’m sure this could be done with sufficiently complex macro
wrappers but that seemed like a
forbidding amount of work and possibly dangerous to
maintainability.
The Monster
on 2013-03-27 at 09:04:36 said:
> It’s easier to buy memory than more processor speed these
days.
The original driver for 64-bit architectures was people who wanted
to cache their entire database in
RAM, and the 32-bit machines couldn’t address enough memory to do
that.
Jeff Read
http://esr.ibiblio.org/?p=4861 29/44
What would be a wo rld like in which every co nstruction engineer
wo uld first work as a
mason and carpenter?
My dad served as a mentor to a couple of UConn mech eng students a
few years back for their
senior project. His big complaint was that while they were smart
and knew their physics, they didn’t
know how to machine at all. He thought it terribly important that
an engineer gain experience as a
machinist, since a technical drawing is basically a set of
instructions to the machinist who will
actually make the part.
on 2013-03-27 at 20:26:31 said:
> Actually I think the standard schoolish way of learning
theory, then hands-on
>experience, then more work experience, is not useful at all in
the two fields I
>was taught, programming/database design and business
administration.
This was my constant complaint about my CS classes. They taught
plenty of theory of the various
modern programming techniques, but there was so little practical
application, and what little there
was was so contrived (a square is a rectangle is a shape for class
inheritance for example) that while
I could give you the reasons why you would want to do these things
on an intellectual level, I had no
gut understanding of why you would go through the extra work.
BRM aka Brian R. Marshall
on 2013-03-27 at 22:54:01 said:
Tangential to the matter at hand, but…
Probably anyone who is into database design has heard this one,
but…
1NF, 2NF and 3NF can be described as:
“The key, the whole key and nothing but the key”
Jakub Narebski
http://esr.ibiblio.org/?p=4861 30/44
> [...] what little there was was so contrived (a square is a
rectangle is a shape for class inheritance
for example) [...]
Particullary because square / rectangle relationship is just a bad
fit and bad example of OOP
inheritance (where more specialized class is usually extended, not
limited).
Patrick Maupin
@Jakub:
where more spec ialized class is usually extended, not
limited
That’s a really good observation.
LS
“where more specialized class is usually extended, not
limited”
“That’s a really good observation.”
Yes, but it just goes to show that while OOP is a good fit for many
problems, it doesn’t make things
much easier. Coming up with a really good set of classes, with the
right ‘responsibilities’ is difficult.
Finding hidden gotchas in the inheritance hierarchy is difficult.
It’s only after you’ve struggled quite a
while with these issues that you end up with a good set of classes
that make the actual program
construction easy.
This is not really an OOP thing. If you’re doing plain old
procedural programming, the hard part is
figuring out how to partition the problem. Once you do that,
everything seems to fall into place.
Either way, what you are doing is actually trying to understand the
problem you are trying to solve.
That’s the hard part.
William Newman
http://esr.ibiblio.org/?p=4861 31/44
ESR wrote “Dot notation [for CLOS] would have been nice, but that’s
just syntax and un-Lispy ”
It’s not what you’re looking for, but you might at least be amused
that I often use a macro
DEF.STRUCT which is a fairly thin layer over stock CL:DEFSTRUCT
which, among other things,
makes accessor names use #\. instead of #\- as the separator
between structure class name and
slot name. (E.g., after (DEF.STRUCT PLANAR-POINT X Y),
PLANAR-POINT.X is the name of an
accessor function.)
More seriously, when you talked earlier about the apparent
limitations of CL for printing objects, my
impression is that the CL printer is more powerful and flexible
than in most languages. It has some
ugly misfeatures in its design (e.g., making stream printing
behavior depend too much on global
special variables instead of per-stream settings). It tends to be
slow. But it is fundamentally
functional and flexible enough that on net I’d list it as an
advantage of CL vs. most other languages in
something like Peter Norvig’s chart. The feature I’ve pushed
hardest on is *PRINT-READABLY*
coupled with complementary readmacros to allow the object to be
read back in. In CL, these hooks
are expressive enough to let me do tricks like writing out a
complex cyclic data structure at the
REPL, and later scrolling back in the REPL or even in the
transcript of a previous session, cutting out
the printed form, pasting it into my new REPL prompt, and getting
“the same” thing. (Of course the
implementor of the readmacros needs to decide how “the same” copes
with technicalities like shared
structure, e.g. by memoization or not.) I am not an expert in
Python 2.x or Ocaml or Haskell, but I’ve
read about them and written thousands of lines of each, and it’s
not clear to me that their
printer/reader configurability is powerful enough to support
this.
esr
on 2013-03-28 at 13:14:35 said:
>More seriously, when you talked earlier about the apparent
limitations of CL for printing objects, my
impression is that the CL printer is more powerful and flexible
than in most languages.
You may well be right. It wouldn’t surprise me if you were.
But you’re falling into a trap here that I find often besets Lisp
advocates (and as I criticize, remember
that I love the language myself). You’re confusing theoretical
availability with practicality to hand. As
you note, and as I had previously noticed, print behavior has ugly
dependencies on global variables.
Separately, supposing your “more powerful” is really there, it is
difficult to use, requiring arcane and
poorly documented invocations. Contrast this with Python str(),
which ten minutes after you’ve first
seen it looks natural enough that any fool can use it.
Programming languages should not be tavern puzzles. The Lispy habit
of saying “yes, you can do X
provided you’re willing to sacrifice a goat at midnight and then
dance widdershins around a flowerpot”
is one of the reasons Lisp advocates are often dismissed as
semi-crackpots. Yes, LISP has
http://esr.ibiblio.org/?p=4861 32/44
on 2013-03-28 at 13:22:53 said:
Those of you who are concerned about software patents have reason
to celebrate: Uniloc just got
handed its ass in their patent suit against Rackspace. In the
very same East Texas district court
that patent trolls venue-shop for to get patent-troll-friendly
rulings. Uniloc is a notorious, and
heretofore rather successful, patent troll; basically if you do any
sort of license verification for a piece
of proprietary software, expect to be sued by Uniloc.
The defense cited not only In re Bilski but two other,
more recent cases: Cybersource v. Retail
Decisions and Dealertrack v. Huber , which establish that
for purposes of the “machine or
transformation test” of patentability, a general-purpose computer
is not a specific enough machine,
and transformation of data is not sufficient transformation.
Given the way the sausage that is law gets made in Murka, I’m not
going to say it’s game over for
software patentholders yet. But their job just got a whole lot
harder.
John Wiseman
> You’re confusing theoretical availability with practicality to
hand
Writing the equivalent of simple __str__ and __repr__ methods in
Lisp is very easy, it’s not some
theoretically-powerful-but-practically-difficult beast. You just
have to know that print-object and *print-
readably* exist, like you have to know that __str__ and __repr__
exist.
If you want to support pretty-printing or printing cyclic data
structures in a way that they can be read
back in, then you need to learn some more Lisp, but that’s actually
not hard either (well, except
pretty-printing–that can be a beast). As far as I know neither is
even possible in Python using the
standard for printing & reading.
http://esr.ibiblio.org/?p=4861 33/44
Programming languages should not be tav ern puzzles. The
Lispy habit of saying “ ye s, you
can do X provided yo u’re willing to sacrifice a goat at midnight
and then dance w iddershins
around a flow erpot” is one o f the reasons Lisp advoc ates are
often dismissed as semi-
crackpots.
@John Wiseman:
If you w ant to suppo rt pretty-printing o r printing c yc
lic data struc tures in a w ay that they
can be read back in, then you nee d to learn some more Lisp, but
that’s actually not hard
either (well, exc ept pretty-printing–that can be a be ast). As far
as I know neither is even
possible in Py thon using the standard for printing & re
ading.
In the general case, there is usually no real reason to worry about
printing for round-tripping in
Python, because pickle handles things like c ircular references
quite nicely.
As far as the other goes, there are several ways to pretty
print things, including leveraging the
standard str() functions by providing your own __str__.
Jess ica Boxer
@esr
> Unfortunately, my experience is that Python profilers suck
rather badly
Whatever happened to “batteries included”?
Improving the performance of anything beyond a trivial program
without a profiler is like painting a
portrait wearing a blindfold. It is a plain observable fact that
programs don’t spend their time where
programmers think they do. It is much more fun to write a cool
optimization than an effective one.
http://esr.ibiblio.org/?p=4861 34/44
Which reminds me of the aphorism that those who don’t use UNIX are
deemed to reinvent it, badly.[*]
One more reason not to do Python, as if there weren’t enough
already.
[*] BTW, I know that isn’t actually what Henry Spencer said, but I
didn’t want to use the real one
since plainly ESR is not lacking understanding here, just
tools.
esr
on 2013-03-28 at 16:48:09 said:
>Whatever happened to “batteries included”?
It’s a question I’ve wondered about myself in this case. There
aren’t many places where Python fails
to live up to its billing; this is one. Actually, the most serious
one I can think of offhand.
>Nonetheless, it sounds like you recognize this and implemented
a custom, rube goldberg profiler.
That’s too negative. What I did is often useful in conjunction with
profilers even when they don’t suck
– I sampled a timer after each phase in my repo analysis and
reported both elapsed time and
percentages. When several different phases call (for example) the
same lookup-commit-by-mark
code, custom instrumentation of the phases can tell you things that
function timings alone will not.
esr
>s/Lisp/Linux/g
Linux is not even within orders of magnitude as bad as Lisp is this
way – they’re really not
comparable. The real-world evidence for that is penetration
levels.
Jay Maynard
on 2013-03-28 at 17:17:39 said:
http://esr.ibiblio.org/?p=4861 35/44
on 2013-03-28 at 17:33:40 said:
>Jessica, what is *your* weapon of choice for the problem space
Python occupies?
I’m curious about that myself. I would rate Ruby approximately as
good (though with weaker library
support) Perl somewhat inferior due to long-term maintainability
issues, and nothing else anywhere
near as good.
on 2013-03-28 at 18:10:40 said:
I’m curious abo ut that myself. I w ould rate Ruby appro
ximately as go od (th ough w ith
we aker library support) Perl somewhat inferior due to long-term
maintainability issues, and
nothing else anywhe re near as good.
Given Jessica’s putative requirements (must be statically typed and
work with the .NET framework),
Boo would be the closest thing to Python; but really, C# is a good
enough language that few people
working within those constraints have a reason to switch away from
it.
Jess ica Boxer
on 2013-03-28 at 18:21:03 said:
I’m not really sure what “problem space” Python occupies. It seems
to me that “every programming
problem” is its domain, according to its advocates.
Nonetheless, as I have said here a number of times as a general
programming language I think C# is
the best system I have used (system including all the peripheral
items that make a language usable.)
The problem space Eric is referring too, what I want to call
“batch” tools, I find C# excellent for that
kind of work.
I doubt you love that answer, but there it is.
http://esr.ibiblio.org/?p=4861 36/44
angularjs as an helper for javascript. It is super cool, very
useful, and has probably tripled my speed
in writing browser side code.
I have never used Ruby, but I have read a little about it and know
someone who has a lot of expertise.
Anything that describes itself as “the good parts of perl” is
unlikely to be appealing to me – because I
don’t think perl has any good parts.
Jess ica Boxer
@esr
> That’s too negative. What I did is often useful in conjunction
with profilers even when they don’t
suck – I sampled a timer after each phase in my repo analysis and
reported both elapsed time and
percentages.
I didn’t read your code, but FWIW, a sampling profiler is more than
adequate for 95% of profiling
needs. Seems to me that you just created the batteries required,
assuming you made it general
enough.
Certainly for optimizing what you need is “show me the top five
places my code spends most of its
time”, which is what that gives you. So Rube Goldberg be damned,
sounds like a great tool you built.
Patrick Maupin
@esr:
It ’s a question I’ve w ondere d about myself in this case.
There aren’t many plac es whe re
Pytho n fails to live up to its billing; this is o ne.
Actually, the most serious o ne I c an think of
offhand.
In my experience, one of the stock profilers (cProfile) works quite
well. But you do have to take into
consideration the number of calls that are made to a given method
(this data is reported as well as
total time spent in each call). An attribute lookup for a call is
quite properly assigned to the calling
function.
http://esr.ibiblio.org/?p=4861 37/44
The problem space Eric is referring too, w hat I w ant to call
“batc h” tools, I find C# exce llent
for th at kind of wo rk.
I agree that C# is a good language. Like Python, its domain space
is huge, so it’s worthwhile honing
your abilities on a general purpose tool like either one of these
rather than remembering arcane batch
syntax.
But I’m not going to use a Microsoft OS and I’m not going to use a
non-Microsoft implementation of
C#, so I’m not using C#.
uma
esr:
rrenaud
on 2013-03-28 at 23:32:45 said:
You have a performance problem, and your first instinct is to
rewrite the code in a different language,
rather than find algorithmic bottlenecks? Maybe you should stop
hating on computer science
education, and start taking some CS classes?
esr
on 2013-03-28 at 23:56:23 said:
>You have a performance problem, and your first instinct is to
rewrite the code in a different
language, rather than find algorithmic bottlenecks? Maybe you
should stop hating on computer
science education, and start taking some CS classes?
How do you pack that many misconceptions into two sentences? It
must take both native talent and
a lot of practice.
http://esr.ibiblio.org/?p=4861 38/44
>A combination of clojure and jython is one possibility
Intriguing thought. I may try it on a future project.
Jay Maynard
on 2013-03-29 at 01:59:31 said:
I have a ready-made C# project that I could hack on if I felt the
need…though something tells me that
diving into a 500 KLOC package as an introduction to a language may
not be that good an idea…
after all, I learned to hate C++ from diving into a then-800 KLOC
package…
Jay Maynard
on 2013-03-29 at 02:03:23 said:
Jessica, I’d say Python’s problem space is that group of programs
for which an interpreted language
is good enough, little to no bit-bashing is needed, and its I/O
capabilities are good enough. yeah,
that’s a pretty wide domain, but by no means “every programming
problem”.
Jeff Read
on 2013-03-29 at 12:20:44 said:
A combinatio n of clo jure and jython is one
possibility
Holy crap, if you thought CL had warts — wait till you get a load
of Clojure. I tried wrapping my head
around it for a “joke project”. I’d been joking around on Reddit
about L33tStart — a fictional init(1)
replacement written in ClojureScript and running on Node — and
decided that such a blasphemous
thing should really, actually exist.
It didn’t take much exposure to Clojure(Script) for me to discover
that I was allergic. That combined
with Clojure’s community of twenty-something naïfs (“holy shit,
guys, Rich Hickey is such a genius!
http://esr.ibiblio.org/?p=4861 39/44
if you minimize side effects and code in strictly functional style,
your programs become simpler and
more tractable!”) is enough to turn me right off the language and
actively discourage other smart folks
from adopting it.
Anyway, Clojure is strictly only as powerful as JScheme or
Kawa — so if you like Scheme you can
use one of those and gain all of Clojure’s java-interop advantages,
plus the awesomeness of working
directly in (a somewhat reduced form of) Scheme.
rrenaud
on 2013-03-29 at 13:02:46 said:
So you find the algorithmic bottlenecks and fix them in Python.
Then you begin a failed translation to
Lisp for no reason?
on 2013-03-29 at 13:58:10 said:
rrenaud – you’ve messed up your reading comprehension somewhere, or
just didn’t read through the
comments from before your initial post – he _thought_ he found
them, then attempted a rewrite, then
found more.
As he said March 25 at 4:06 pm: “””At the time I began
looking at Lisp, I believed – mistakenly – that
I had already found and optimized out the stuff that could be
attacked that way. In my defense, I will
note that the remaining O(n**2) code was pretty well obscured; it
took a couple of weeks of
concentrated attention by two able hackers to find it, and that was
after I’d built the machinery for
gathering timings.”””
Jay Maynard
on 2013-03-29 at 15:26:26 said:
rrenaud: Why do you think I said that was unlike Eric? Unlike you,
apparently, I do know him
personally and have what I think is a decent grasp on his hacking
style, and the idea that he’d
http://esr.ibiblio.org/?p=4861 40/44
on 2013-03-29 at 16:12:35 said:
>the idea that he’d commence a port for performance reasons
before making sure every last drop of
speed was wrung out of it algorithmically is something that he’d
normally ridicule with vigor.
Indeed. But to be fair, I didn’t actually give enough information
in the OP to exclude the following two
theories: (1) Eric had a momentary attack of brain-damage and
behaved in a way he would normally
ridicule, (2) Eric had a momentary attack of “oooh, look at the
shiny Lisp” and put more effort into
thinking about a port to that specific language than the
evidence justified.
Neither theory is true, mind you. But I can’t entirely blame anyone
for entertaining them, because I
didn’t convey the whole sequence of events exactly.
rrenaud’s biggest mistake was to suppose that I hate CS education;
in fact, while I have little use for
the version taught everywhere but a handful of excellent schools
like MIT/CMU/Stanford, “mild
contempt” would be a far better description than “hate”. If these
places were doing their job properly,
hackers at my skill level and above wouldn’t be rara aves – and I
wish that were the case, because
there’s lots of work to go around.
His funniest mistake was that he thought CS education would fix the
mistake he believed me to be
making. See above…
on 2013-03-29 at 19:58:24 said:
2) was the theory I’d come up with…figuring you had a sudden need
to connect with your roots or
something. Like I occasionally fire up a CP/M system.
Jay Maynard
on 2013-03-29 at 20:00:33 said:
And CS education, to me, seems to be a good way to train
people to be computing theorists, which
is almost entirely orthogonal to hacking ability. I’ve never had a
single CS course, have no plans to do
http://esr.ibiblio.org/?p=4861 41/44
esr:
Another possibility is chicken scheme and cython, with
possibly a thin layer of “C” glue.
http://www.call-cc.org/
on 2013-03-30 at 00:01:33 said:
My experience with performance tuning is that you get the greatest
gains by starting with a really bad
algorithm. Fortunately, there are a lot of those lying
around.
janzert
on 2013-03-30 at 03:05:40 said:
It would be interesting to see the performance of pypy on the post
optimization version. The question
being, did the algorithmic optimization that was done help or hurt
the relative performance of pypy?
esr
on 2013-03-30 at 06:30:12 said:
>It would be interesting to see the performance of pypy on the
post optimization version. The
question being, did the algorithmic optimization that was done help
or hurt the relative performance of
pypy?
It’s easy enough to run that test that I’m doing it now. Timing
stock Python on the 56K-commit
benchmark repo, 270sec (208 commits/sec). Same with pypy 178sec
(315 commits/sec) That’s
interesting – actually a significant speedup this time. I wasn’t
seeing that when I wrote the OP, quite
http://esr.ibiblio.org/?p=4861 42/44
might be worth running a bisection to find out what.
Russ Nelson
on 2013-03-31 at 02:08:21 said:
Warning: Jay and Jessica, if you fail to appreciate Python as the
transcendent language of the gods,
you will be replaced by a small Python script after the
Singularity!
esr
on 2013-03-31 at 02:35:32 said:
>Warning: Jay and Jessica, if you fail to appreciate Python as
the transcendent language of the
gods, you will be replaced by a small Python script after the
Singularity!
“There is another theory which states this has already
occurred.”
Jay Maynard
on 2013-03-31 at 08:16:48 said:
Heh. Python is *my* weapon of choice for the problems it can
handle.
Jacob Hallén
on 2013-03-31 at 19:11:20 said:
Go see the people in the PyPy channel on Freenode about why your
code is slow. Slowness is
considered to be a bug, unless you your code is too short to
overcome warmup.
Jeff Read
http://esr.ibiblio.org/?p=4861 43/44
on 2013-04-01 at 17:54:59 said:
Warning: Jay and Jessica, if you fail to appreciate Python as the
transcende nt language of
the go ds, you w ill be replaced by a small Python script after the
Singularity!
Any singularity based on Python will itself meet a day of
reckoning with the Gods of the Copybook
Headings, who insist that bugs caught at runtime are orders of
magnitude more expensive to fix than
bugs caught at compile time.
Traceback (most recent call last):
File “/usr/bin/singularity.py”, line 8643, in run_ai
File “/usr/lib/python2.7/dist-packages/ai/ai.py”, line 137406, in
get_neuron_state:
File “/usr/lib/python2.7/dist-packages/ai/neuralnet.py”, line
99205, in query_neuron
File “/usr/lib/python2.7/dist-packages/ai/neuron.py”, line 20431,
in query_synapse
TypeError: expected object of type ‘SynapseConfiguration’, got
‘NoneType’
Strong static typing systems are not put into languages just to
make your lives miserable, folks.
Jakub Narebski
on 2013-04-01 at 21:34:34 said:
@Jeff Read: Strong typing does not necessarily mean static typing,
ask ML (or Haskell, I’m not sure
which), with its implied types (and correctness checking that can
discover errors in an algorithm by
type mismatch).
Patrick Maupin
@Jakub Narebski:
There are actually three different things that get conflated on
typing:
strength
static/dynamic
explicit/implicit
Python has reasonably strong typing that is dynamic and
implicit.
http://esr.ibiblio.org/?p=4861 44/44
Typing on older languages is usually explicit. Even C#, which has
“implicit” local variables, still
requires variable declarations for those. You tell the compiler —
here’s a variable, figure it out based
on its use.
Strong is usually good. Static is usually good. Implicit is usually
good. It wasn’t until recently that
you could have all three.
Jeff Read
on 2013-04-04 at 18:49:36 said:
Strong typing does not necessarily mean static typing, ask ML
(or Haske ll, I ’m not sure
which), with its implied types (and correctness chec king that can
disco ver errors in an
algorithm by type mismatch).
Never said it did. I chose the phrase “strong static typing”
specifically to contrast with weak static
typing (e.g., C) and strong dynamic typing (e.g., Python,
Lisp).
Also, both Haskell and ML support type inference.
Alexander Todorov
on 2013-04-07 at 17:53:02 said:
I’m well aware of the principle. Unfortunately, my experience is
that Python profilers suck rather badly
– you generally end up having to write your own
instrumentation to gather timings, which is what I did
in this case. It helped me find the obscured O(n**2)
operations.
Did you use any profiling tools at all ? I’m interested to hear if
there are any ready made tools that