Speed up your code

Table of Contents

Understanding performance

  1. Speeding up software with faster hardware: tradeoffs and alternatives
    Throwing hardware at a software performance is often an easy solution, and sometimes the right one. Learn how to approach the decision, and some alternatives.

  2. Transgressive Programming: the magic of breaking abstractions
    Usually you want to stick to abstraction boundaries when coding. But for performance or debugging, you may need to deliberately break those boundaries.

  3. Speed is situational: two websites, two orders of magnitude
    How do you make your application fast? It depends every much on your particular case, as you’ll see in this example case study.

  4. Optimizing your code is not the same as parallelizing your code
    To make your Python code faster, start with optimizing single-threaded versions, then consider multiprocessing, and only then think about a cluster.

  5. Memory location matters for performance
    Performance is not just determined by how many CPU instructions your code runs; it’s also determined by your memory access patterns.

  6. How vectorization speeds up your Python code
    Vectorization allows you to speed up processing of homogeneous data in Python. Learn what it means, when it applies, and how to do it.

Measuring performance

  1. Where’s your bottleneck? CPU time vs wallclock time
    Slow software performance may be due to CPU, I/O, locks, and more. Learn a quick heuristic to help you identify which it is.

  2. Faster, more memory-efficient Python JSON parsing with msgspec
    msgspec is a schema-based JSON encoder/decoder, which allows you to process large files with lower memory and CPU usage.

  3. Beyond cProfile: Choosing the right tool for performance optimization
    There are different profilers you can use to measure Python performance. Learn about cProfile, sampling profilers, and logging, and when to use each.

  4. Not just CPU: writing custom profilers for Python
    Sometimes existing Python profilers aren’t enough: you need to measure something unusual. Learn how to write your own cProfile-based custom profiler.

  5. Logging for scientific computing: debugging, performance, trust
    Logging can help you understand and speed up your scientific computing code, and convince yourself and others that you can trust the results.
    (Originally a PyCon 2019 talk—you can also watch a video)

  6. CI for performance: Reliable benchmarking in noisy environments
    Running performance benchmarks can result in noise drowning out the signal. Learn how to get reliable performance benchmarks with Cachegrind.

  7. Docker can slow down your code and distort your benchmarks
    In theory, Docker containers have no performance overhead. In practice, they can actually slow down your code and distort performance measurements.

  8. Creating a better flamegraph visualization
    Flamegraphs are a great way to visualize performance and memory bottlenecks, but with a little tweaking, you can make them even more useful.

Parallelism and multiprocessing

  1. When Python can’t thread: a deep-dive into the GIL’s impact
    Python’s Global Interpreter Lock (GIL) stops threads from running in parallel or concurrently. Learn how to determine impact of the GIL on your code.

  2. Why your multiprocessing Pool is stuck (it’s full of sharks!)
    On Linux, the default configuration of Python’s multiprocessing library can lead to deadlocks and brokenness. Learn why, and how to fix it.

  3. The Parallelism Blues: when faster code is slower
    By default NumPy uses multiple CPUs for certain operations. But sometimes parallelism can actually slow down your code.

Libraries and applications

  1. Choosing a faster JSON library for Python
    There are multiple JSON encoding/decoding libraries available for Python. Learn how you can choose the fastest for your particular use case.

  2. All Pythons are slow, but some are faster than others
    Python on Ubuntu is not always the same speed as Python in the python Docker image. So I ran some benchmarks, so you can pick the fastest.

  3. The hidden performance overhead of Python C extensions
    A compiled language like Rust or C is a lot faster than Python, with some caveats. Learn about the hidden overhead you’ll need to overcome.

  4. Cython, Rust, and more: choosing a language for Python extensions
    You can write Python extensions with Cython, Rust, and many other tools. Learn which one you should use, depending on your particular needs.

  5. Faster Python calculations with Numba: 2 lines of code, 13× speed-up
    Python-based calculations, especially those that use NumPy, can run much faster by using the Numba library.

  6. The fastest way to read a CSV in Pandas
    Learn the fastest way to read a CSV in to Pandas.

Speed up your testing

  1. Fast tests for slow services: why you should use verified fakes
    Sometimes your Python tests need to talk to something slow and not under your control. Learn how to write fast and realistic tests, without resorting to mocks.

  2. Why Pylint is both useful and unusable, and how you can use it
    You want to find bugs in your Python code before as you write your code. PyLint is a great tool for this, but it has some problems you’ll need to work around.

  3. Stuck with slow tests? Speed up your feedback loop
    Sometimes you can’t speed up your Python test suite. What you can do, however, is find failures faster with linters, partial testing, and more.

  4. When your CI is taking forever on AWS, it might be EBS
    When running tests or builds on AWS, a bad EBS configuration can slow everything down; learn how to identify the problem and speed up your build.

  5. Realistic, easy, and fast enough: database tests with Docker
    Realistic tests require a real database—but that can be difficult and slow. But Docker makes it simple, and some tweaks can make faster.

  6. When C extensions crash: easier debugging for your Python application
    If your Python test suite segfaults in C code, debugging is difficult. But an easy configuration tweak can help you pinpoint the responsible code.

Learn practical Python software engineering skills you can use at your job

Too much to learn? Don't know where to start?

Sign up for my newsletter, and join over 5500 Python developers and data scientists learning practical tools and techniques, from Docker packaging to testing to Python best practices, with a free new article in your inbox every week.


Sciagraph™: Always-on performance and memory profiling for production batch jobs ($)

If your production data processing batch jobs are running too slowly, using too much memory, or costing too much, you need to understand why. Sciagraph is an always-on, production-grade profiler you can use to get immediate insights into your code’s bottlenecks.

Speed up your production batch jobs with Sciagraph