Articles: Speed up your data science and scientific computing code

Reduce memory usage
Speed up your code

Reduce memory usage

Lacking CPU, your program runs slower; lacking memory, your program crashes. But you can process larger-than-RAM datasets in Python, as you’ll learn in the following series of articles.

Code structure

Copying data is wasteful, mutating data is dangerous
Copying data wastes memory, and modifying/mutating data can lead to bugs. Learn how to implement a compromise between the two in Python: hidden mutability.
Clinging to memory: how Python function calls can increase memory use
Python will automatically free objects that aren’t being used. Sometimes function calls can unexpectedly keep objects in memory; learn why, and how to fix it.
Massive memory overhead: Numbers in Python and how NumPy helps
Storing integers or floats in Python has a huge overhead in memory. Learn why, and how NumPy makes things better.
Too many objects: Reducing memory overhead from Python instances
Objects in Python have large memory overhead. Learn why, and what do about it: avoiding dicts, fewer objects, and more.

Data management techniques

Estimating and modeling memory requirements for data processing
Learn to how measure and model memory usage for Python data processing batch jobs based on input size.
When your data doesn’t fit in memory: the basic techniques
You can process data that doesn’t fit in memory by using four basic techniques: spending money, compression, chunking, and indexing.
Processing large JSON files in Python without running out of memory
Loading complete JSON files into Python can use too much memory, leading to slowness or crashes. The solution: process JSON data one chunk at a time.
Loading Pydantic models from JSON without running out of memory
Pydantic’s JSON loading uses a huge amount of memory; here’s how to reduce it.

Pandas

Measuring the memory usage of a Pandas DataFrame
Learn how to accurately measure memory usage of your Pandas DataFrame or Series.
Reducing Pandas memory usage #1: lossless compression
Load a large CSV or other data into Pandas using less memory with techniques like dropping columns, smaller numeric dtypes, categoricals, and sparse columns.
Reducing Pandas memory usage #2: lossy compression
Reduce Pandas memory usage by dropping details or data that aren’t as important.
Reducing Pandas memory usage #3: Reading in chunks
Reduce Pandas memory usage by loading and then processing a file in chunks rather than all at once, using Pandas’ chunksize option.
Fast subsets of large datasets with Pandas and SQLite
You have a large amount of data, and you want to load only part into memory as a Pandas dataframe. One easy way to do it: indexing via SQLite database.
Loading SQL data into Pandas without running out of memory
Pandas can load data from a SQL query, but the result may use too much memory. Learn how to process data in batches, and reduce memory usage even further.
Saving memory with Pandas 1.3’s new string dtype
Storing strings in Pandas can use a lot of memory, but with Pandas 1.3 you have access to a newer, more efficient option.
From chunking to parallelism: faster Pandas with Dask
Learn how Dask can both speed up your Pandas data processing with parallelization, and reduce memory usage with transparent chunking.
Why Polars uses less memory than Pandas
Polars is an alternative to Pandas than can often run faster—and use less memory!
Don’t bother trying to estimate Pandas memory usage
Estimating Pandas memory usage from the data file size is surprisingly difficult. Learn why, and some alternative approaches that don’t require estimation.

NumPy

Reducing NumPy memory usage with lossless compression
Reduce NumPy memory usage by choosing smaller dtypes, and using sparse arrays.
NumPy views: saving memory, leaking memory, and subtle bugs
NumPy uses memory views transparently, as a way to save memory. But you need to understand how they work, so you don’t leak memory, or modify data by mistake.
The problem with float32: you only get 16 million values
Switching from float64 (double-precision) to float32 (single-precision) can cut memory usage in half. But how do you deal with data that doesn’t fit?
Loading NumPy arrays from disk: mmap() vs. Zarr/HDF5
Learn how to load larger-than-memory NumPy arrays from disk using either mmap() (using numpy.memmap), or the very similar Zarr and HDF5 file formats.
The mmap() copy-on-write trick: reducing memory usage of array copies
Copying a NumPy array and modifying it doubles the memory usage. But by utilizing the operating system’s mmap() call, you can only pay for what you modify.

Measuring memory usage

Measuring memory usage in Python: it’s tricky!
Measuring your Python program’s memory usage is not as straightforward as you might think. Learn two techniques, and the tradeoffs between them.
Fil: a Python memory profiler for data scientists and scientists
Fil is a Python memory profiler designed specifically for the needs of data scientists and scientists running data processing pipelines.
Debugging Python out-of-memory crashes with the Fil profiler
Debugging Python out-of-memory crashes can be tricky. Learn how the Fil memory profiler can help you find where your memory use is happening.
Dying, fast and slow: out-of-memory crashes in Python
There are many ways Python out-of-memory problems can manifest: slowness due to swapping, crashes, MemoryError, segfaults, kill -9.
Debugging Python server memory leaks with the Fil profiler
When your Python server is leaking memory, the Fil memory profiler can help you spot the buggy code.
The surprising way to save memory with BytesIO
If you want to save memory when reading from a BytesIO, getvalue() is surprisingly a good choice.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.

Speed up your code

Understanding performance

Faster hardware is a bad first solution to slow software
If your software is slow, throwing hardware at the problem is often a bad first solution.
Transgressive Programming: the magic of breaking abstractions
Usually you want to stick to abstraction boundaries when coding. But for performance or debugging, you may need to deliberately break those boundaries.
Early speed optimizations aren’t premature
Making your code faster from the start is, in fact, an excellent idea.
Speed is situational: two websites, two orders of magnitude
How do you make your application fast? It depends every much on your particular case, as you’ll see in this example case study.
Optimizing your code is not the same as parallelizing your code
To make your Python code faster, start with optimizing single-threaded versions, then consider multiprocessing, and only then think about a cluster.
Memory location matters for performance
Performance is not just determined by how many CPU instructions your code runs; it’s also determined by your memory access patterns.
How vectorization speeds up your Python code
Vectorization allows you to speed up processing of homogeneous data in Python. Learn what it means, when it applies, and how to do it.
CPUs, cloud VMs, and noisy neighbors: the limits of parallelism
Learn how your computer (or virtual machine’s) CPU cores and how they’re configured limit the parallelism of your computations.
Pandas vectorization: faster code, slower code, bloated memory
Vectorization in Pandas can make your code faster—except when it will make your code slower.
The limits of Python vectorization as a performance technique
Vectorization is a great way to speed up your Python code, but you’re limited to specific operations on bulk data. Learn how to get pass these limitations.
Good old-fashioned code optimization never goes out of style
Sometimes the way to speed up your application is a better data structures and more efficient code.
Speeding up your code when multiple cores aren’t an option
Parallelism isn’t the only answer: often you can optimize low-level code to get significant performance improvements.
Let’s optimize! Running 15× faster with a situation-specific algorithm
Sometimes the best way to speed up your algorithm is to adjust it to the specifics of your data.
330× faster: Four different ways to speed up your code
There are many approaches to speeding up Python code; applying multiple approaches can make your code even faster.

Measuring performance

Where’s your bottleneck? CPU time vs wallclock time
Slow software performance may be due to CPU, I/O, locks, and more. Learn a quick heuristic to help you identify which it is.
Faster, more memory-efficient Python JSON parsing with msgspec
msgspec is a schema-based JSON encoder/decoder, which allows you to process large files with lower memory and CPU usage.
Beyond cProfile: Choosing the right tool for performance optimization
There are different profilers you can use to measure Python performance. Learn about cProfile, sampling profilers, and logging, and when to use each.
Invasive procedures: Python affordances for performance measurement
Learn a variety of—sometimes horrible—ways to instrument and measure performance in Python.
Not just CPU: writing custom profilers for Python
Sometimes existing Python profilers aren’t enough: you need to measure something unusual. Learn how to write your own cProfile-based custom profiler.
Logging for scientific computing: debugging, performance, trust
Logging can help you understand and speed up your scientific computing code, and convince yourself and others that you can trust the results.
(Originally a PyCon 2019 talk—you can also watch a video)
CI for performance: Reliable benchmarking in noisy environments
Running performance benchmarks can result in noise drowning out the signal. Learn how to get reliable performance benchmarks with Cachegrind.
Finding performance bottlenecks in Celery tasks
Learn how to speed up Celery tasks by figuring out what’s causing them to be slow.
Docker can slow down your code and distort your benchmarks
In theory, Docker containers have no performance overhead. In practice, they can actually slow down your code and distort performance measurements.
The best way to find performance bottlenecks: observing production
Performance bottlenecks causes vary widely, from network latency to software bugs. Observation in production may therefore be the only way to find them.
Finding performance problems: profiling or logging?
If your software is running slowly in production, do you need profiling or logging?
Find slow data processing tasks (before your customers do)
Your data processing jobs are fast… most of the time. Next, find the slow runs so you can speed them up.
Creating a better flamegraph visualization
Flamegraphs are a great way to visualize performance and memory bottlenecks, but with a little tweaking, you can make them even more useful.

Parallelism and multiprocessing

When Python can’t thread: a deep-dive into the GIL’s impact
Python’s Global Interpreter Lock (GIL) stops threads from running in parallel or concurrently. Learn how to determine impact of the GIL on your code.
Why your multiprocessing Pool is stuck (it’s full of sharks!)
On Linux, the default configuration of Python’s multiprocessing library can lead to deadlocks and brokenness. Learn why, and how to fix it.
The Parallelism Blues: when faster code is slower
By default NumPy uses multiple CPUs for certain operations. But sometimes parallelism can actually slow down your code.
Who controls parallelism? A disagreement that leads to slower code
The libraries you’re using might be running more threads than you realize—and that can mean slower execution.
Python’s multiprocessing performance problem
While multiprocessing allows Python to scale to multiple CPUs, it has some performance overhead compared to threading.
Two kinds of threads pools, and why you need both
How big should your thread pool be? It depends on your use case.
How many CPU cores can you actually use in parallel?
Figuring out how much parallelism your program can use is surprisingly tricky.

Libraries and applications

Choosing a faster JSON library for Python
There are multiple JSON encoding/decoding libraries available for Python. Learn how you can choose the fastest for your particular use case.
All Pythons are slow, but some are faster than others
Python on Ubuntu is not always the same speed as Python in the python Docker image. So I ran some benchmarks, so you can pick the fastest.
The hidden performance overhead of Python C extensions
A compiled language like Rust or C is a lot faster than Python, with some caveats. Learn about the hidden overhead you’ll need to overcome.
Cython, Rust, and more: choosing a language for Python extensions
You can write Python extensions with Cython, Rust, and many other tools. Learn which one you should use, depending on your particular needs.
Faster Python calculations with Numba: 2 lines of code, 13× speed-up
Python-based calculations, especially those that use NumPy, can run much faster by using the Numba library.
The fastest way to read a CSV in Pandas
Learn the fastest way to read a CSV in to Pandas.
Some reasons to avoid Cython
Cython is an easy way to speed up your Python code—but it doesn’t scale well to large projects.
Understanding CPUs can help speed up Numba and NumPy code
With a little understanding of how CPUs and compilers work, you can speed up NumPy with faster Numba code.
The easiest way to speed up Python with Rust
Rust can make your Python code much faster; here’s how to start using it as quickly as possible.
NumPy 2 is coming: preventing breakage, updating your code
NumPy 2 is coming, and it’s backwards incompatible. Learn how to keep your code from breaking, and how to upgrade.
Profiling your Numba code
Learn how to use the Profila profiler to find performance bottlenecks in your Numba code.
Let’s build and optimize a Rust extension for Python
Python code too slow? You can quickly create a Rust extension to speed it up.
Speeding up text processing in Python (is hard)
How do you speed up Python string parsing and formatting? We’ll consider Cython, mypyc, Rust, and PyPy.
Polars for initial data analysis, Polars for production
Initial and exploratory data analysis have different requirements than production data processing; Polars supports both.
When NumPy is too slow
What do you do when your NumPy code isn’t fast enough? We’ll discuss the options, from Numba to JAX to manual optimizations.
Speeding up Cython with SIMD
SIMD is a CPU feature that lets you speed up numeric processing; learn how to use it with Cython.
Choosing a good file format for Pandas
CSV, JSON, Parquet—which data format should you use for your Pandas data?
Using Polars in a Pandas world
Pandas has far more third-party integrations than Polars. Learn how to use those libraries with Polars dataframes.

GPUs

Beware of misleading GPU vs CPU benchmarks
Are GPU replacements for CPU-based libraries really that much faster?
Not just NVIDIA: GPU programming that runs everywhere
If you want to run GPU programs in CI, on Macs, and more, wgu-py is a good option.

Speed up your testing

Fast tests for slow services: why you should use verified fakes
Sometimes your Python tests need to talk to something slow and not under your control. Learn how to write fast and realistic tests, without resorting to mocks.
Why Pylint is both useful and unusable, and how you can use it
You want to find bugs in your Python code before as you write your code. PyLint is a great tool for this, but it has some problems you’ll need to work around.
Stuck with slow tests? Speed up your feedback loop
Sometimes you can’t speed up your Python test suite. What you can do, however, is find failures faster with linters, partial testing, and more.
When your CI is taking forever on AWS, it might be EBS
When running tests or builds on AWS, a bad EBS configuration can slow everything down; learn how to identify the problem and speed up your build.
Realistic, easy, and fast enough: database tests with Docker
Realistic tests require a real database—but that can be difficult and slow. But Docker makes it simple, and some tweaks can make faster.
When C extensions crash: easier debugging for your Python application
If your Python test suite segfaults in C code, debugging is difficult. But an easy configuration tweak can help you pinpoint the responsible code.
Goodbye to Flake8 and PyLint: faster linting with Ruff
Ruff is a new linter that is vastly faster than PyLint and flake8—with many of the same checks.
Catching memory leaks with your test suite
If you have a good test suite, you may be able use pytest fixtures to identify memory and other resource leaks.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.

Articles: Speed up your data science and scientific computing code

Table of Contents

Reduce memory usage

Code structure

Data management techniques

Pandas

NumPy

Measuring memory usage

Speed up your Python code and learn skills you can use at your job

Speed up your code

Understanding performance

Measuring performance

Parallelism and multiprocessing

Libraries and applications

GPUs

Speed up your testing

Speed up your Python code and learn skills you can use at your job