Make Lisp 15x faster than Python or 4x faster than Java
Hi Guys - welcome back to the Postabon blog (after our not-so-brief hiatus)!
I've been insanely busy for the last few weeks (I don't think I ever appreciated just how much actual work launching a startup is, until I actually did one). In the last month or so we've rewritten most of the front end of our website (it's much prettier!), made several trips to NYC and the west coast (most recently for Jason Calacanis's Open Angel Forum (among other things) where I got to meet lots of amazing angels and entrepreneurs), and so much more I can't talk about publicly just yet ;-)
I've had this post in mind for a few weeks now and, since I couldn't sleep last night, I finally got around to banging it out.
One of the three most common criticisims I hear when I tell people I'm using Lisp for my startup is that 'Lisp is so slow!' (the other two being 'How can you code with all those ugly parentheses?!', and the dirth of libraries - which I've addressed in other places).
This is quite funny to me, because Lisp is one of the fastest high level languages I know of (with a few obvious exceptions, in very strongly typed languages like Haskell or OCaml ;-)). One of the featuers I like most about Common Lisp is that you can optionally provide the compiler type hints, and you'll come close to the speed of raw C. Most languages provide some sort foreign function interface - but it's usually ugly, hard to use, and non-portable. Being able to take any existing Lisp code, and drop in a few '(declare (type ...))' declarations and getting the speed of a low level language is pretty great!
I thought I'd demonstrate this functionality, with a bit of real world code. Postabon recommends deals to users based on a variety of factors such as the age of the deal, other people's votes, your preferences, and (relevant for this example) your distance from the deal. Most of the other factors can be computed asyncronously and just cached, but your distance from deals is computed a lot, and can't really be pre-computed since I have no way of knowing your location a priori (well, that's not strictly true, and we do do some memoization, but it has a relatively low hit rate). The most common way to compute the distance between two latitude/longitude pairs is to assume the earth is a perfect sphere, and then do some basic trig that's known as the 'Haversine Formula' (Postabon actually does something a little different internally, but this is a reasonable simplification).
Some profiling suggested my program was spending a fair amount of time in a 'distance' function - and, since I'd already tackled a lot of the other low hanging fruit, I decided to take a few minutes to speed it up.
Without further ado, I'll present the results and the code, and talk about how I ran these tests.
None of the code is 'optimal', but they are all just literal line-by-line translations of each other (and of the formula in the Wikipedia article :-D). Wherever one uses a temporary variable, so do the others, etc. All the benchmarks were run 5x each on the same Debian machine, using the latest runtime/compiler available from apt. Also of note is that virtual machine start up and end time was disregarded since all timings were done in code. I did experiment with some changes to make things more idiomatic (e.g., using doubles in Java, or list comprehensions in Python), but I ended up using whatever was fastest.
To transform the first piece of Lisp code into the second, I told the compiler I wanted to optimize for speed by typing (declare (optimize (speed 3) (safety 0) (debug 0))) at the top of the salient functions. Then, when I tried to compile my code, it would then tell me whenever it couldn't infer the type of something (or was being forced to coerce types of something) with a warning like:
; in: DEFUN DISTANCE
; (* (THE FIXNUM *EARTH-RADIUS*) 2
; (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
; ==>
; (* (* (THE FIXNUM *EARTH-RADIUS*) 2)
; (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
;
; note: doing signed word to integer coercion (cost 20), for:
; the first argument of GENERIC-*
Translated, that just means that when SBCL is multiplying the Earth's radius by two, it doesn't know for certain that the result will fit in a 'fixnum' (analagous to an 'int' in most other languages) so it has to assume the result is an integer (which is an infinite precision integer, similar to a 'BigInt' in most other languages). This conversion is relatively expensive but, since I know the Earth's radius will always be under 7,000 km, - I can just give the compiler a 'hint' and guarantee that the result will always be small enough to fit in a 'fixnum.'
I'll admit these warnings are a little cryptic. When I first started programming with SBCL it used to me almost an hour to figure them all out properly optimize a small piece of code. But, after doing it a half dozen times, I can usually anticipate where the type conversions/ambiguities are going to be before the compiler even tells me - and I can usually speed up most small functions three or four fold with just five minutes of work.
Of course, I still do this sort of optimization fairly rarely, since the code loses some flexibility (and I'd rather work on adding new features until profiling tells me I have to go do this). I also very rarely use (safety 0) in production (if I do, I'm careful to have enough pre-assertions to prove that my hints will never be wrong). But, it's certainly nice to have the option to make the code go so fast so easily.
If I'm making some terrible mistake in my Java or Python code (which is possible, I'm a bit rusty with both), please let me know or submit a revised (but still equivalent) solution, and I'll be happy to rerun the benchmarks.
