Making pip installs a little less slow
Installing your Python application’s dependencies can be surprisingly slow. Whether you’re running tests in CI, building a Docker image, or installing an application, downloading and installing dependencies can take a while.
So how do you speed up installation with pip
?
In this article I’ll cover:
- Avoiding the slow path of installing from source.
pip
download speed, and the alternatives: Pipenv and Poetry.- A useful
pip
option that can, sometimes, speed up installation significantly.
Avoiding installs from source
When you install a Python package, there are two ways you can install it, typically:
- The packaged up source file, often a
.tar.gz
with asetup.py
. In this case, installing will often require running Python code (a little slow), and sometimes compiling large amounts of C/C++/Rust code (potentially extremely slow). - A wheel (
.whl
files) that can just be unpacked straight on to the filesystem, with no need to run code or compile native extensions.
If at all possible, you want to install wheels, because installing from source will be slower. If you need to compile significant amounts of C code, installing from source will be much slower; instead of relying on precompiled binaries, you’ll need to compile it all yourself.
To ensure you’re installing wheels as much as possible:
- Make sure you’re using the latest version of
pip
before installing dependencies. Binary wheels sometimes require newer versions ofpip
than the one packaged by default by your current Python. - Don’t use Alpine Linux; stick to Linux distributions that use
glibc
, e.g Debian/Ubuntu/RedHat/etc.. Standard Linux wheels requireglibc
, but Alpine uses themusl
C library. Wheels formusl
-based distributions like Alpine are starting to become available, but they’re still not as common.
Comparing installation speed between pip
, Pipenv, and Poetry
The default Python package manager is pip
, but you can also use Pipenv and Poetry, both of which add additional functionality like virtualenv management.
I compared the speed of all three.
Methodology
Installing Python packages involves two steps:
- Downloading the package.
- Installing the already downloaded package.
By default, Python package managers will cache downloaded packages on disk, so if you install them a second time in a different virtualenv the package won’t need to be re-downloaded. I therefore measured both variants: a cold cache where the package had to be downloaded, and a warm cache where the package was already available locally.
In all cases I made sure to create the virtualenvs in advance, and for pip
I made sure to use hashes in the requirements.txt
, to match the hash validation that the other two package managers do by default.
I used the transitive dependencies for installing pandas
and matplotlib
, resulting in the installation of 12 different packages in total.
Results
Here’s how long each installation took, measuring both wallclock and CPU time:
Tool | Cache | Wallclock time | CPU time |
---|---|---|---|
pip 22.1.1 |
Cold | 16.2s | 10.7s |
pip 22.1.1 |
Warm | 10.5s | 9.4s |
Pipenv 2022.5.2 | Cold | 12.5s | 26.0s |
Pipenv 2022.5.2 | Warm | 9.7s | 25.2s |
Poetry 1.1.13 | Cold | 12.1s | 18s |
Poetry 1.1.13 | Warm | 10.2s | 17.8s |
Some things to notice:
pip
is the slowest by wallclock time when the cache is cold.- Wallclock time isn’t really that different between any of them when the cache is warm, i.e. the packages are already downloaded.
- Both Pipenv and Poetry use parallelism, as we can see from CPU time that is higher than wallclock time;
pip
is currently single-threaded. - Pipenv uses quite a lot of CPU compare to the other two; Poetry is a bit better, but still higher than
pip
.
This example was run with 12 packages being installed; with a larger number of dependencies, it’s possible that Poetry’s parallel installation would have more of an impact.
Keeping the cache warm
Notice that in all cases you get a speedup from having a warm cache, i.e. reusing already downloaded packages. On your local machine, that happens automatically. In most CI services, your cache will start out empty.
To work around that, most CI systems will have some way to store a cache directory at the end of the run, and then load it at the beginning of the next run. If you’re using GitHub Actions, you can use the built-in caching support in the action used to setup Python.
This is still not as fast as running on a dedicated machine, however: storing and loading the cache also takes time.
Going (very slightly) faster by disabling the version check
On startup pip
may check if you’re running the latest version or not, and print a warning if you’re not.
You can disable this check like so:
pip --disable-pip-version-check install ...
This saves me about 0.2-0.3s, not a very significant improvement; the actual improvement probably depends on your network speed and other factors.
Going faster (sometimes) with disabled compilation
Can we do better? In some cases, yes.
After packages are downloaded (if they’re not cached locally) and installed on to the filesystem, package managers do one final step: they compile the .py
source files into .pyc
bytecode files, and store them in __pycache__
directories.
This is not the same as compiling a C extension, this is just an optimization to make loading Python code faster on startup.
Instead of having to compile the .pyc
at import time, the .pyc
is already there.
It turns out that bytecode compilation takes a significant amount of the time spent by pip install
.
But you can disable this step by calling pip install --no-compile
.
Here’s a comparison of how long it takes to install packages both with and without .pyc
compilation, in both cases when the cache is warm so no downloads are needed:
Installation method | Cache | Wallclock time | CPU time |
---|---|---|---|
pip install |
Warm | 10.5s | 9.4s |
pip install --no-compile |
Warm | 4.8s | 4.0s |
So should you always use this option?
Not necessarily.
Just because pip install
is faster doesn’t mean you’ve saved time overall.
Any module you import will still need to be compiled into a .pyc
, it’s just that the work will happen at Python run time, instead of at package installation time.
So if you’re importing all or most modules, overall you might not save any time at all, you’ve just moved the work to a different place.
In other cases, however, --no-compile
will save you time.
For example, in your testing setup you might be installing many third-party packages for integration testing, but only using a small amount of those libraries’ code.
As such, there’s no point in compiling lots of modules you won’t be using.
Neither Pipenv nor Poetry seem to support this option at this time.
Package installation could be much faster
Given how many people use Python, slow package installations add up.
It’s difficult to estimate how many pip install
s are happening in the world, but pip
itself was downloaded 100 million times in the month previous to writing this article, so we can take that as a lower bound.
If you could shave just 1 second off of every one of those 100 million installs, that would be 3.17 years of waiting saved every month.
There is clearly a lot of room for improvement in package installation in the Python world:
- Poetry already implements parallelism to some extent, but it doesn’t seem to be as efficient as one might hope, given higher CPU usage than
pip
. But it may already be faster on wallclock basis for larger number of dependencies. - Pipenv’s CPU usage is even worse.
As for pip
:
- In a world where multiple CPUs are the default, and single core speed increases have stalled, pretty much every CPU-based task
pip
does could benefit from parallelism: - Parallel downloads and version verification would also be helpful; for small package sizes, network latency is the likely bottleneck, something parallelism can help with.
If you’re interested in helping, the pip
repository has a number of issues and in-progress PRs covering various aspects.
Finally, if you maintain open source Python packages: since wheels install faster, make sure to provide wheels for your package, even if it’s pure Python.