Articles: Speed up your data science and scientific computing code

Table of Contents

Reduce memory usage

Lacking CPU, your program runs slower; lacking memory, your program crashes. But you can process larger-than-RAM datasets in Python, as you’ll learn in the following series of articles.

Code structure

  1. Copying data is wasteful, mutating data is dangerous
    Copying data wastes memory, and modifying/mutating data can lead to bugs. Learn how to implement a compromise between the two in Python: hidden mutability.

  2. Clinging to memory: how Python function calls can increase memory use
    Python will automatically free objects that aren’t being used. Sometimes function calls can unexpectedly keep objects in memory; learn why, and how to fix it.

  3. Massive memory overhead: Numbers in Python and how NumPy helps
    Storing integers or floats in Python has a huge overhead in memory. Learn why, and how NumPy makes things better.

  4. Too many objects: Reducing memory overhead from Python instances
    Objects in Python have large memory overhead. Learn why, and what do about it: avoiding dicts, fewer objects, and more.

Data management techniques

  1. Estimating and modeling memory requirements for data processing
    Learn to how measure and model memory usage for Python data processing batch jobs based on input size.

  2. When your data doesn’t fit in memory: the basic techniques
    You can process data that doesn’t fit in memory by using four basic techniques: spending money, compression, chunking, and indexing.

  3. Processing large JSON files in Python without running out of memory
    Loading complete JSON files into Python can use too much memory, leading to slowness or crashes. The solution: process JSON data one chunk at a time.

Pandas

  1. Measuring the memory usage of a Pandas DataFrame
    Learn how to accurately measure memory usage of your Pandas DataFrame or Series.

  2. Reducing Pandas memory usage #1: lossless compression
    Load a large CSV or other data into Pandas using less memory with techniques like dropping columns, smaller numeric dtypes, categoricals, and sparse columns.

  3. Reducing Pandas memory usage #2: lossy compression
    Reduce Pandas memory usage by dropping details or data that aren’t as important.

  4. Reducing Pandas memory usage #3: Reading in chunks
    Reduce Pandas memory usage by loading and then processing a file in chunks rather than all at once, using Pandas’ chunksize option.

  5. Fast subsets of large datasets with Pandas and SQLite
    You have a large amount of data, and you want to load only part into memory as a Pandas dataframe. One easy way to do it: indexing via SQLite database.

  6. Loading SQL data into Pandas without running out of memory
    Pandas can load data from a SQL query, but the result may use too much memory. Learn how to process data in batches, and reduce memory usage even further.

  7. Saving memory with Pandas 1.3’s new string dtype
    Storing strings in Pandas can use a lot of memory, but with Pandas 1.3 you have access to a newer, more efficient option.

  8. From chunking to parallelism: faster Pandas with Dask
    Learn how Dask can both speed up your Pandas data processing with parallelization, and reduce memory usage with transparent chunking.

  9. Why Polars uses less memory than Pandas
    Polars is an alternative to Pandas than can often run faster—and use less memory!

  10. Don’t bother trying to estimate Pandas memory usage
    Estimating Pandas memory usage from the data file size is surprisingly difficult. Learn why, and some alternative approaches that don’t require estimation.

NumPy

  1. Reducing NumPy memory usage with lossless compression
    Reduce NumPy memory usage by choosing smaller dtypes, and using sparse arrays.

  2. NumPy views: saving memory, leaking memory, and subtle bugs
    NumPy uses memory views transparently, as a way to save memory. But you need to understand how they work, so you don’t leak memory, or modify data by mistake.

  3. The problem with float32: you only get 16 million values
    Switching from float64 (double-precision) to float32 (single-precision) can cut memory usage in half. But how do you deal with data that doesn’t fit?

  4. Loading NumPy arrays from disk: mmap() vs. Zarr/HDF5
    Learn how to load larger-than-memory NumPy arrays from disk using either mmap() (using numpy.memmap), or the very similar Zarr and HDF5 file formats.

  5. The mmap() copy-on-write trick: reducing memory usage of array copies
    Copying a NumPy array and modifying it doubles the memory usage. But by utilizing the operating system’s mmap() call, you can only pay for what you modify.

Measuring memory usage

  1. Measuring memory usage in Python: it’s tricky!
    Measuring your Python program’s memory usage is not as straightforward as you might think. Learn two techniques, and the tradeoffs between them.

  2. Fil: a Python memory profiler for data scientists and scientists
    Fil is a Python memory profiler designed specifically for the needs of data scientists and scientists running data processing pipelines.

  3. Debugging Python out-of-memory crashes with the Fil profiler
    Debugging Python out-of-memory crashes can be tricky. Learn how the Fil memory profiler can help you find where your memory use is happening.

  4. Dying, fast and slow: out-of-memory crashes in Python
    There are many ways Python out-of-memory problems can manifest: slowness due to swapping, crashes, MemoryError, segfaults, kill -9.

  5. Debugging Python server memory leaks with the Fil profiler
    When your Python server is leaking memory, the Fil memory profiler can help you spot the buggy code.


Speed up your Python code and learn skills you can use at your job

Join over 7600 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.


Speed up your code

Understanding performance

  1. Faster hardware is a bad first solution to slow software
    If your software is slow, throwing hardware at the problem is often a bad first solution.

  2. Transgressive Programming: the magic of breaking abstractions
    Usually you want to stick to abstraction boundaries when coding. But for performance or debugging, you may need to deliberately break those boundaries.

  3. Early speed optimizations aren’t premature
    Making your code faster from the start is, in fact, an excellent idea.

  4. Speed is situational: two websites, two orders of magnitude
    How do you make your application fast? It depends every much on your particular case, as you’ll see in this example case study.

  5. Optimizing your code is not the same as parallelizing your code
    To make your Python code faster, start with optimizing single-threaded versions, then consider multiprocessing, and only then think about a cluster.

  6. Memory location matters for performance
    Performance is not just determined by how many CPU instructions your code runs; it’s also determined by your memory access patterns.

  7. How vectorization speeds up your Python code
    Vectorization allows you to speed up processing of homogeneous data in Python. Learn what it means, when it applies, and how to do it.

  8. CPUs, cloud VMs, and noisy neighbors: the limits of parallelism
    Learn how your computer (or virtual machine’s) CPU cores and how they’re configured limit the parallelism of your computations.

  9. Pandas vectorization: faster code, slower code, bloated memory
    Vectorization in Pandas can make your code faster—except when it will make your code slower.

  10. The limits of Python vectorization as a performance technique
    Vectorization is a great way to speed up your Python code, but you’re limited to specific operations on bulk data. Learn how to get pass these limitations.

  11. Good old-fashioned code optimization never goes out of style
    Sometimes the way to speed up your application is a better data structures and more efficient code.

  12. Speeding up your code when multiple cores aren’t an option
    Parallelism isn’t the only answer: often you can optimize low-level code to get significant performance improvements.

Measuring performance

  1. Where’s your bottleneck? CPU time vs wallclock time
    Slow software performance may be due to CPU, I/O, locks, and more. Learn a quick heuristic to help you identify which it is.

  2. Faster, more memory-efficient Python JSON parsing with msgspec
    msgspec is a schema-based JSON encoder/decoder, which allows you to process large files with lower memory and CPU usage.

  3. Beyond cProfile: Choosing the right tool for performance optimization
    There are different profilers you can use to measure Python performance. Learn about cProfile, sampling profilers, and logging, and when to use each.

  4. Invasive procedures: Python affordances for performance measurement
    Learn a variety of—sometimes horrible—ways to instrument and measure performance in Python.

  5. Not just CPU: writing custom profilers for Python
    Sometimes existing Python profilers aren’t enough: you need to measure something unusual. Learn how to write your own cProfile-based custom profiler.

  6. Logging for scientific computing: debugging, performance, trust
    Logging can help you understand and speed up your scientific computing code, and convince yourself and others that you can trust the results.
    (Originally a PyCon 2019 talk—you can also watch a video)

  7. CI for performance: Reliable benchmarking in noisy environments
    Running performance benchmarks can result in noise drowning out the signal. Learn how to get reliable performance benchmarks with Cachegrind.

  8. Finding performance bottlenecks in Celery tasks
    Learn how to speed up Celery tasks by figuring out what’s causing them to be slow.

  9. Docker can slow down your code and distort your benchmarks
    In theory, Docker containers have no performance overhead. In practice, they can actually slow down your code and distort performance measurements.

  10. The best way to find performance bottlenecks: observing production
    Performance bottlenecks causes vary widely, from network latency to software bugs. Observation in production may therefore be the only way to find them.

  11. Finding performance problems: profiling or logging?
    If your software is running slowly in production, do you need profiling or logging?

  12. Find slow data processing tasks (before your customers do)
    Your data processing jobs are fast… most of the time. Next, find the slow runs so you can speed them up.

  13. Creating a better flamegraph visualization
    Flamegraphs are a great way to visualize performance and memory bottlenecks, but with a little tweaking, you can make them even more useful.

Parallelism and multiprocessing

  1. When Python can’t thread: a deep-dive into the GIL’s impact
    Python’s Global Interpreter Lock (GIL) stops threads from running in parallel or concurrently. Learn how to determine impact of the GIL on your code.

  2. Why your multiprocessing Pool is stuck (it’s full of sharks!)
    On Linux, the default configuration of Python’s multiprocessing library can lead to deadlocks and brokenness. Learn why, and how to fix it.

  3. The Parallelism Blues: when faster code is slower
    By default NumPy uses multiple CPUs for certain operations. But sometimes parallelism can actually slow down your code.

  4. Who controls parallelism? A disagreement that leads to slower code
    The libraries you’re using might be running more threads than you realize—and that can mean slower execution.

  5. Python’s multiprocessing performance problem
    While multiprocessing allows Python to scale to multiple CPUs, it has some performance overhead compared to threading.

  6. Two kinds of threads pools, and why you need both
    How big should your thread pool be? It depends on your use case.

  7. How many CPU cores can you actually use in parallel?
    Figuring out how much parallelism your program can use is surprisingly tricky.

Libraries and applications

  1. Choosing a faster JSON library for Python
    There are multiple JSON encoding/decoding libraries available for Python. Learn how you can choose the fastest for your particular use case.

  2. All Pythons are slow, but some are faster than others
    Python on Ubuntu is not always the same speed as Python in the python Docker image. So I ran some benchmarks, so you can pick the fastest.

  3. The hidden performance overhead of Python C extensions
    A compiled language like Rust or C is a lot faster than Python, with some caveats. Learn about the hidden overhead you’ll need to overcome.

  4. Cython, Rust, and more: choosing a language for Python extensions
    You can write Python extensions with Cython, Rust, and many other tools. Learn which one you should use, depending on your particular needs.

  5. Faster Python calculations with Numba: 2 lines of code, 13× speed-up
    Python-based calculations, especially those that use NumPy, can run much faster by using the Numba library.

  6. The fastest way to read a CSV in Pandas
    Learn the fastest way to read a CSV in to Pandas.

  7. Some reasons to avoid Cython
    Cython is an easy way to speed up your Python code—but it doesn’t scale well to large projects.

  8. Understanding CPUs can help speed up Numba and NumPy code
    With a little understanding of how CPUs and compilers work, you can speed up NumPy with faster Numba code.

  9. The easiest way to speed up Python with Rust
    Rust can make your Python code much faster; here’s how to start using it as quickly as possible.

  10. NumPy 2 is coming: preventing breakage, updating your code
    NumPy 2 is coming, and it’s backwards incompatible. Learn how to keep your code from breaking, and how to upgrade.

  11. Profiling your Numba code
    Learn how to use the Profila profiler to find performance bottlenecks in your Numba code.

  12. Speeding up text processing in Python (is hard)
    How do you speed up Python string parsing and formatting? We’ll consider Cython, mypyc, Rust, and PyPy.

  13. Polars for initial data analysis, Polars for production
    Initial and exploratory data analysis have different requirements than production data processing; Polars supports both.

  14. When NumPy is too slow
    What do you do when your NumPy code isn’t fast enough? We’ll discuss the options, from Numba to JAX to manual optimizations.

  15. Speeding up Cython with SIMD
    SIMD is a CPU feature that lets you speed up numeric processing; learn how to use it with Cython.

  16. Choosing a good file format for Pandas
    CSV, JSON, Parquet—which data format should you use for your Pandas data?

  17. Using Polars in a Pandas world
    Pandas has far more third-party integrations than Polars. Learn how to use those libraries with Polars dataframes.

GPUs

  1. Beware of misleading GPU vs CPU benchmarks
    Are GPU replacements for CPU-based libraries really that much faster?

  2. Not just NVIDIA: GPU programming that runs everywhere
    If you want to run GPU programs in CI, on Macs, and more, wgu-py is a good option.

Speed up your testing

  1. Fast tests for slow services: why you should use verified fakes
    Sometimes your Python tests need to talk to something slow and not under your control. Learn how to write fast and realistic tests, without resorting to mocks.

  2. Why Pylint is both useful and unusable, and how you can use it
    You want to find bugs in your Python code before as you write your code. PyLint is a great tool for this, but it has some problems you’ll need to work around.

  3. Stuck with slow tests? Speed up your feedback loop
    Sometimes you can’t speed up your Python test suite. What you can do, however, is find failures faster with linters, partial testing, and more.

  4. When your CI is taking forever on AWS, it might be EBS
    When running tests or builds on AWS, a bad EBS configuration can slow everything down; learn how to identify the problem and speed up your build.

  5. Realistic, easy, and fast enough: database tests with Docker
    Realistic tests require a real database—but that can be difficult and slow. But Docker makes it simple, and some tweaks can make faster.

  6. When C extensions crash: easier debugging for your Python application
    If your Python test suite segfaults in C code, debugging is difficult. But an easy configuration tweak can help you pinpoint the responsible code.

  7. Goodbye to Flake8 and PyLint: faster linting with Ruff
    Ruff is a new linter that is vastly faster than PyLint and flake8—with many of the same checks.


Speed up your Python code and learn skills you can use at your job

Join over 7600 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.