Python gives you fast development—and slow code

When you’re trying to understand new data for the first time, Python is ideal for quick, interactive exploration. Whether you’re cleaning up messy data, or prototyping different analyses, Python is easy to write and easy to experiment with.

But eventually your bottleneck shifts, from your ability to come up with new ideas to how fast your code can run:

If it takes an hour to run an experiment, you can only run a handful of experiments every day.
If you need results in 30 seconds, ten minutes is far too long.
If you burn all your budget on cloud computing, how will your employer pay your salary?

Oftentimes your existing libraries—like NumPy or Pandas—will enable your code to run quickly, or at least quickly enough. But when they don’t, Python on its own is just

way

too

slow.

What do you do when compiled software isn’t fast enough?

You need code that is fast, reliable, and correct, while building on the Python code you’ve already written.

A good first step is switching some of your code—the part with the performance bottlenecks—to a compiled language like Cython or Rust, or a just-in-time compiler like Numba. This can often you give you a significant speed boost, not to mention enabling more flexible interaction patterns like for loops over your arrays.

If you’re lucky, that will suffice. But often, your newly rewritten low-level code won’t be any faster than the NumPy or Pandas code it was trying to replace—it might even be slower!

What do you do when your compiled code isn’t fast enough either?

Your first though might be parallelism, using a pool of threads or processes. But that comes with its own set of problems:

Parallelism might speed up your code, but it doesn’t reduce costs, especially when you need to scale.
Some algorithms are difficult to parallelize.
Parallelism can add computational overhead and significant complexity.

So before you start down that path, it’s worth considering another, much simpler option. With a little more understanding of how your computer works, you can make your compiled code run much faster.

Modern CPUs are fast—if you use them right

Here’s a common mental model for how CPUs work:

The CPU core executes one instruction at time; for example, first it adds one pair of numbers, and then another.
Different instructions run at the same speed.
The CPU only takes one branch of an if statement.
Reading from and writing to memory is fast, and happens at a consistent speed.

This mental model is misleading if you need to write fast code. In reality:

CPUs can execute multiple instructions at the same time, on a single core.
Some CPU instructions are much faster than others.
CPUs might end up executing both branches of an if statement (although you will only ever see one).
Reading from memory might be fast… or so slow your CPU effectively grinds to a halt.

Learn how to make your compiled code 10× faster

Modern CPUs are fast, in part because of these rather surprising features. But the full power of your CPU only becomes available if your code is able to take advantage of it. Tweak your code appropriately, and you can often get an order of magnitude speedup—before you’ve even bothered thinking about multithreading or multiprocessing.

You can learn how to do this by reading the appropriate generic books (they’ll usually expect you to know how to write C), cobbling together a reading list of blog posts, and then figuring out how to apply all this in the context of Python code.

Much better if you could just write faster code and get back to analyzing your data. To help you do that as quickly as possible, I’m working on a book that will teach you how to:

Identify where switching to compiled code can be helpful.
Understand the benefits of compiled code, and why it might not be enough on its own.
Speed up your low-level compiled code significantly, by taking advantage of CPU features like instruction-level parallelism, branch prediction, SIMD, and memory caches. (It’s OK if you don’t yet know what those are! In the past, neither did I. Learning about them is what the book is for.)

More details:

The main focus will be speeding up numeric computing, the kind of calculations you’d do as a scientist, data scientist, or research software engineer using NumPy and similar libraries.
You won’t have to know C, Cython, or Rust to read the book—just knowing some Python is good enough.
However, your new knowledge and skills will apply to C, Cython, or Rust, or any low-level compiled language you happen to be using.

Interested? Sign up below to get updates on the book, and a weekly article on Python performance, handling larger-than-memory datasets, Docker packaging, and more.

Get notified when the book comes out!

Plus, get weekly emails on speeding up Python data processing, processing larger-than-memory datasets, Docker packaging for Python, and more.