Options for Python packaging: Wheels, Conda, Docker, and more
You’ve written your Python application—a server, CLI tool, or batch process—and now you need to distribute it to the machines where it will be running. In order to run your application, you will need:
- Your code.
- Various Python libraries your code depends on, like Flask or NumPy.
- The shared libraries in other languages (C, C++, etc.) that your code and its Python dependencies depend on.
- The Python interpreter.
- The shared libraries Python depends on, typically C libraries like libc and OpenSSL.
How exactly do you package up your application so all of these are available? There’s Docker, of course, but there are actually many more options, from wheels to system packages to Conda to PEX to self-contained executables, each with their own tradeoffs. And that’s just a partial list!
Given the large range of choices, there are too many to cover each in detail. Instead, this article will give a sense of the different categories, the pros and cons, and provide links to specific implementations within each category. For simplicity’s sake I will only cover running on Linux.
1. Python package installed with pip
, e.g. Wheel
One option for distributing your application is to package it as something you can install with pip
.
That means a binary wheel, a tarball with the source code, or even a git repository: pip
can install any of these.
Basically, the only thing you are packaging is your code.
Python dependencies can be specified as dependencies in your packaging, and automatically installed by pip
.
You can include third party C libraries in wheels, but for sufficiently complex dependencies that won’t work.
Thus some additional C libraries or programs, and definitely the Python executable, have to be installed using some other mechanism, e.g. system packages (RPM/DEB).
This is therefore a good option in the following situations:
- Your system libraries and Python version are standardized across applications, so you just preinstall all the necessary ones on your server farm.
- Or, you are packaging a simple Python application that doesn’t need much beyond what’s available on PyPI or your private PyPI-equivalent server.
This is a bad, or at least insufficient, option if:
- You have a heterogeneous collection of applications requiring different versions of Python.
- Your application requires custom C/C++/etc. libraries that can’t be installed via
pip
, and that you can’t assume are pre-installed.
In particular, while multiple applications with conflicting requirements can be supported using virtualenvs, this starts getting trickier if you’re installing different versions of C libraries via system packages.
2. PEX and friends
PEX, Subpar, and Shiv, and zipapp are all ways to package up your Python code and its Python dependencies into a single, executable file. PEX seems to be the most popular and best supported one, so if you choose to use this category that’s what I would use. These tools don’t package up external C libraries, or the interpreter.
They’re useful if you want to distribute a single executable file with all of your Python code.
Beyond that, they have the same issues as just installing with pip
: you may additionally need to distribute dependencies as system packages (RPM/DEB).
Unlike pip
, there’s no distribution mechanism: you will need to distribute the pex
file somehow.
3. System package, e.g. RPM or DEB
Another option is to package your code using a RPM (for RHEL or similar Docker images) or DEB package (for Debian/Ubuntu). Python dependencies are typically included within the package—you might package a whole virtualenv, for example.
In theory, the system packages might include some Python packages, but it’s a bad idea to depend on them, since you’re tying your dependency upgrade cycle to the distribution release cycle. Much better to to include all Python dependencies in your RPM/DEB.
C libraries can again be either dependencies on other system packages or included in the package itself. The interpreter will typically be installed by depending on—you guessed it—another system package.
Basically you end up installing one RPM/DEB, which then depends on a whole bunch of other RPM/DEBs.
This is a good option if you’re running a virtual machine and want to run a single application.
This is a bad option if:
- The specific packages you need are the wrong version on the particular OS version you’re using.
- You want to run multiple applications with multiple dependencies on the same machine or virtual machine. This is possible in theory, but can end up being difficult due to conflicts or unavailable packages.
4. Conda packaging
With pip
-based packages, system packages and Python packages are two distinct package managers.
The Conda package system combines the two into a single package system: your dependencies are all provided as Conda packages too.
That includes Python dependencies, C shared libraries, and even the Python interpreter into Conda packages.
The only thing it uses from the base operating system is the standard C library.
If you’re using the Conda-Forge package channel, you also have access to a huge number of existing open source packages. From personal experience, adding packages to Conda-Forge is a surprisingly pleasant experience.
Often your actual application isn’t a Conda package, it’s just your code plus a list of Conda packages to install. So you still need to distribute your application somehow.
Conda is quite good at supporting multiple applications with different dependencies (including Python version, and C library version) on the same machine or virtual machine.
5. Self-contained executable
Tools like PyInstaller and PyOxidizer let you create a self-contained executable that includes the Python interpreter, your code, and your Python dependencies. External shared C libraries are typically not included.
This is a good option if you don’t use you any special C shared libraries beyond the minimum you’d expect on any Linux distribution.
You will however have to distribute the executable somehow.
6. Container image (Docker, Singularity)
Container systems like Docker and Singularity let you distribute a complete, isolated filesystem, everything you need to run the application. As such, everything is included, from C libraries to Python interpreter to your code to your dependencies.
This is a great solution if you need to run lots of different combinations and variants, and you need some level of isolation.
The downside is that you have multiple layers of packaging: you are likely going to have to do both Docker packaging and rely on one of the previously covered mechanisms.
Overall comparison
Here’s a comparison of all the above options, in terms of what they include.
I means that particular requirement is included in the package itself, D means it can be specified as a dependency.
So for example a wheel includes your code, and you can specify it depends on matplotlib
or flask
.
Requirement | Wheel | PEX | RPM/Deb | Conda | Executable | Docker |
---|---|---|---|---|---|---|
Your code | I | I | I | I/❌ | I | I |
Python deps | D | I | I/D | I/D | I | I |
C libraries | I/❌ | ❌ | I/D | I/D | ❌ | I |
Python exe | ❌ | ❌ | D | D | I | I |
We can also compare the installation mechanism for your code, i.e. whether there’s some infrastructure for downloading it automatically:
Packaging type | Installation mechanism |
---|---|
Wheel | pip /poetry /pipenv |
PEX | None |
RPM/DEB | dnf or apt |
Conda | None if you use environment.yml , conda for dependencies |
Executable | None |
Docker | docker image pull |
And finally, support for multiple applications on the same machine:
Packaging type | Multiple applications support mechanism |
---|---|
Wheel | virtualenv |
PEX | Run different PEX in parallel |
RPM/DEB | Install multiple packages |
Conda | Conda environments |
Executable | Run different exes in parallel |
Docker | Isolated containers |
Which one should you use? That depends on your particular situation.