Introduction to Dockerizing for Production

Improve your DevOps skills: learn an iterative process for Dockerizing your code.

Faster pip installs: caching, bytecode compilation, and uv

by Itamar Turner-Trauring
Last updated 22 Jan 2025, originally created 22 Jan 2025

Installing your Python application’s dependencies can be surprisingly slow. Whether you’re running tests in CI, building a Docker image, or installing an application, downloading and installing dependencies can take a while.

So how do you speed up installation with pip?

In this article I’ll cover:

Avoiding the slow path of installing from source.
The package cache.
Bytecode compilation and how it interacts with installation and startup speed.
Using uv, a faster replacement for pip, and why it’s not always as fast as it might initially seem.

Avoiding installs from source

When you install a Python package, there are two ways you can install it, typically:

The packaged up source file, often a .tar.gz with a pyproject.toml or (for old packages) a setup.py. In this case, installing will often require running Python code (a little slow), and sometimes compiling large amounts of C/C++/Rust code (potentially extremely slow).
A wheel (.whl files) that can just be unpacked straight on to the filesystem, with no need to run code or compile native extensions.

If at all possible, you want to install wheels, because installing from source will be slower. If you need to compile significant amounts of C code, installing from source will be much slower; instead of relying on precompiled binaries, you’ll need to compile it all yourself.

To ensure you’re installing wheels in many cases as possible:

Make sure you’re using the latest version of pip before installing dependencies. Binary wheels sometimes require newer versions of pip than the one packaged by default by your current Python. Or better yet, as we’ll discuss below, use uv.
Don’t use Alpine Linux; stick to Linux distributions that use glibc, e.g Debian/Ubuntu/RedHat/etc.. Standard Linux wheels require glibc, but Alpine uses the musl C library. Wheels for musl-based distributions like Alpine are available for many projects, but they’re less common.

Note: If you maintain open source or otherwise installable Python packages: since wheels install faster, make sure to provide wheels for your package, even if it’s pure Python.

Keeping the package cache warm

Once you’ve dealt with that, the next question is whether you can avoid downloading the packages. Installing Python packages involves two steps:

Downloading the package, if necessary.
Installing the already downloaded package.

To speed up the first step, when a package manager like pip downloads a package, it will typically store a copy locally in a cache: some tool-specific directory on the filesystem. That means the next time you install that package, the package manager can check if it already has a cached copy, and if so just install that. This saves download times.

If the package is already in the cache, we say the cache is “warm”. Here’s how this impacts performance, measuring both wallclock and CPU time:

Tool	Cache	Wallclock time	CPU time
`pip install`	Cold	8.5s	6.1s
`pip install`	Warm	6.3s	5.6s

This difference in speed is tied to the latency bandwidth on my Internet connection, so it could be better or worse in other locations. In general, however, the cold cache version will always be slower.

Benchmarking methodology: I made sure to create the virtualenvs in advance, and I used hashes in the requirements.txt since that really should be the default for security reasons. I used the transitive dependencies for installing pandas and matplotlib, resulting in the installation of 14 different packages in total. I used Python 3.13, on a CPU with ~20 cores, pip version 24.3.1 and uv version 0.5.22.

One problem with the package cache is that in most CI services, your cache will start out empty, since you’re starting with a new blank virtual machine or container. To work around that, most CI systems will have some way to store a cache directory at the end of the run, and then load it at the beginning of the next run. If you’re using GitHub Actions, you can use the built-in caching support in the action used to setup Python (you’re going to be caching the cache!).

Of course, storing and loading the cache also takes time, so if you have many or large dependencies try it both ways and see which is faster.

Reallocating slowness by disabling bytecode compilation

Another task that can slow down package installation is bytecode compilation.

After packages are unpacked on to the filesystem, package managers sometimes do one final step: they compile the .py source files into .pyc bytecode files, and store them in __pycache__ directories. This is not the same as compiling a C extension, this is just an optimization to make loading Python code faster on startup. Instead of having to compile the .pyc at import time, the .pyc is already there.

It turns out that bytecode compilation takes a significant amount of the time spent by pip install, since it’s on by default. We can see the performance impact of this step by calling pip install --no-compile. Here’s a comparison of how long it takes to install packages both with and without .pyc compilation, with a warm cache:

Installation method	Cache	Wallclock time	CPU time
`pip install`	Warm	6.3s	5.6s
`pip install --no-compile`	Warm	2.5s	1.8s

Importantly, just because disabling bytecode compilation speeds up installation doesn’t mean you’ve saved time overall.

Any module you import will still need to be compiled into a .pyc, it’s just that the work will happen when your program runs, instead of at package installation time. So if you’re importing all or most modules, overall you might not save any time at all, you’ve just moved the work to a different place.

In other cases, however, disabling bytecode compilation will save you time. For example, in your testing setup you might be installing many third-party packages for integration testing, but only using a small amount of those libraries’ code. As such, there’s no point in compiling lots of modules you won’t be using.

Switching to `uv`, a faster reimplementation of `pip`

uv is a mostly compatible re-implementation of pip and other related tools. Out of the box, uv is much faster, because it:

Is written in Rust, a faster language than Python.
Downloads packages in parallel.
Takes advantage of multiple CPUs.
Disables the bytecode compilation by default, having it be opt-in as opposed to pip’s opt-out.

In their default configuration, uv pip install is much faster than pip install:

Tool	Cache	Wallclock time	CPU time
`pip install`	Cold	8.5s	6.1s
`pip install`	Warm	6.3s	5.6s
`uv pip install`	Cold	1.7s	1.0s
`uv pip install`	Warm	0.0s	0.1s

However, this is somewhat misleading.

With matching settings, `uv`’s performance lead declines

By default, as mentioned above, pip will do bytecode compilation and uv will disable it. The above table therefore isn’t a fair comparison.

What happens if both tools have enabled bytecode compilation, by doing uv pip install --compile-bytecode?

Tool	Cache	Wallclock time	CPU time
`pip install`	Cold	8.5s	6.1s
`pip install`	Warm	6.3s	5.6s
`uv pip install --compile-bytecode`	Cold	2.4s	11.2s
`uv pip install --compile-bytecode`	Warm	0.5s	9.8s

Wallclock time is still much faster, though less so, but the measured CPU time suggests uv is actually slower than pip when bytecode compilation is enabled. This combination is possible because it’s using multiple threads and taking advantage of multiple cores, and my CPU has 20 cores.

In CI the number of cores is likely much smaller: default x86-64 GitHub Actions Linux builders have four “cores”. It wouldn’t surprise me if these were “vCPUs” and actually effectively just two physical CPU cores. In any case, 4 cores is rather less than the 20 on the computer I used to test this.

The slower CPU time is not as bad as it looks, however. In order to compile bytecode, uv launches Python worker processes, which have a fixed startup overhead. My guess was that with fewer cores, uv will use fewer threads by default and therefore launch fewer worker processes.

And in fact when using just a single CPU core:

Tool run with single CPU core	Cache	Wallclock time	CPU time
`pip install`	Cold	9.3s	6.2s
`pip install`	Warm	6.2s	5.6s
`uv pip install --compile-bytecode`	Cold	6.0s	4.9s
`uv pip install --compile-bytecode`	Warm	4.1s	4.1s

In this scenario, pip performance is unchanged other than some noise due to download speed; it’s single-threaded, after all. Meanwhile uv is still faster than pip… but a lot less so.

Make your package installation faster

Some takeaways:

Test preserving your package download cache in CI to reduce the need for downloads; it might or might not help.
uv is faster than pip, though how much faster varies on configuration.
Decide whether or not bytecode compilation makes sense in your case.
Once you’ve switched to uv, you’ll likely benefit from more CPU cores. That being said, when you have a warm cache and no compilation, uv is so fast that it doesn’t really matter how many cores you have!

Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler

Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.

Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.

Introduction to Dockerizing for Production

Faster pip installs: caching, bytecode compilation, and uv

Avoiding installs from source

Keeping the package cache warm

Reallocating slowness by disabling bytecode compilation

Switching to uv, a faster reimplementation of pip

With matching settings, uv’s performance lead declines

Make your package installation faster

Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler

Speed up your Python code and learn skills you can use at your job

Switching to `uv`, a faster reimplementation of `pip`

With matching settings, `uv`’s performance lead declines