Introduction to Dockerizing for Production

Improve your DevOps skills: learn an iterative process for Dockerizing your code.

Options for Python packaging: Wheels, Conda, Docker, and more

by Itamar Turner-Trauring
Last updated 04 Jan 2022, originally created 10 Aug 2020

You’ve written your Python application—a server, CLI tool, or batch process—and now you need to distribute it to the machines where it will be running. In order to run your application, you will need:

Your code.
Various Python libraries your code depends on, like Flask or NumPy.
The shared libraries in other languages (C, C++, etc.) that your code and its Python dependencies depend on.
The Python interpreter.
The shared libraries Python depends on, typically C libraries like libc and OpenSSL.

How exactly do you package up your application so all of these are available? There’s Docker, of course, but there are actually many more options, from wheels to system packages to Conda to PEX to self-contained executables, each with their own tradeoffs. And that’s just a partial list!

Given the large range of choices, there are too many to cover each in detail. Instead, this article will give a sense of the different categories, the pros and cons, and provide links to specific implementations within each category. For simplicity’s sake I will only cover running on Linux.

1. Python package installed with `pip`, e.g. Wheel

One option for distributing your application is to package it as something you can install with pip. That means a binary wheel, a tarball with the source code, or even a git repository: pip can install any of these.

Basically, the only thing you are packaging is your code. Python dependencies can be specified as dependencies in your packaging, and automatically installed by pip. You can include third party C libraries in wheels, but for sufficiently complex dependencies that won’t work.

Thus some additional C libraries or programs, and definitely the Python executable, have to be installed using some other mechanism, e.g. system packages (RPM/DEB).

This is therefore a good option in the following situations:

Your system libraries and Python version are standardized across applications, so you just preinstall all the necessary ones on your server farm.
Or, you are packaging a simple Python application that doesn’t need much beyond what’s available on PyPI or your private PyPI-equivalent server.

This is a bad, or at least insufficient, option if:

You have a heterogeneous collection of applications requiring different versions of Python.
Your application requires custom C/C++/etc. libraries that can’t be installed via pip, and that you can’t assume are pre-installed.

In particular, while multiple applications with conflicting requirements can be supported using virtualenvs, this starts getting trickier if you’re installing different versions of C libraries via system packages.

2. PEX and friends

PEX, Subpar, and Shiv, and zipapp are all ways to package up your Python code and its Python dependencies into a single, executable file. PEX seems to be the most popular and best supported one, so if you choose to use this category that’s what I would use. These tools don’t package up external C libraries, or the interpreter.

They’re useful if you want to distribute a single executable file with all of your Python code. Beyond that, they have the same issues as just installing with pip: you may additionally need to distribute dependencies as system packages (RPM/DEB).

Unlike pip, there’s no distribution mechanism: you will need to distribute the pex file somehow.

3. System package, e.g. RPM or DEB

Another option is to package your code using a RPM (for RHEL or similar Docker images) or DEB package (for Debian/Ubuntu). Python dependencies are typically included within the package—you might package a whole virtualenv, for example.

In theory, the system packages might include some Python packages, but it’s a bad idea to depend on them, since you’re tying your dependency upgrade cycle to the distribution release cycle. Much better to to include all Python dependencies in your RPM/DEB.

C libraries can again be either dependencies on other system packages or included in the package itself. The interpreter will typically be installed by depending on—you guessed it—another system package.

Basically you end up installing one RPM/DEB, which then depends on a whole bunch of other RPM/DEBs.

This is a good option if you’re running a virtual machine and want to run a single application.

This is a bad option if:

The specific packages you need are the wrong version on the particular OS version you’re using.
You want to run multiple applications with multiple dependencies on the same machine or virtual machine. This is possible in theory, but can end up being difficult due to conflicts or unavailable packages.

4. Conda packaging

With pip-based packages, system packages and Python packages are two distinct package managers. The Conda package system combines the two into a single package system: your dependencies are all provided as Conda packages too. That includes Python dependencies, C shared libraries, and even the Python interpreter into Conda packages.

The only thing it uses from the base operating system is the standard C library.

If you’re using the Conda-Forge package channel, you also have access to a huge number of existing open source packages. From personal experience, adding packages to Conda-Forge is a surprisingly pleasant experience.

Often your actual application isn’t a Conda package, it’s just your code plus a list of Conda packages to install. So you still need to distribute your application somehow.

Conda is quite good at supporting multiple applications with different dependencies (including Python version, and C library version) on the same machine or virtual machine.

5. Self-contained executable

Tools like PyInstaller and PyOxidizer let you create a self-contained executable that includes the Python interpreter, your code, and your Python dependencies. External shared C libraries are typically not included.

This is a good option if you don’t use you any special C shared libraries beyond the minimum you’d expect on any Linux distribution.

You will however have to distribute the executable somehow.

6. Container image (Docker, Singularity)

Container systems like Docker and Singularity let you distribute a complete, isolated filesystem, everything you need to run the application. As such, everything is included, from C libraries to Python interpreter to your code to your dependencies.

This is a great solution if you need to run lots of different combinations and variants, and you need some level of isolation.

The downside is that you have multiple layers of packaging: you are likely going to have to do both Docker packaging and rely on one of the previously covered mechanisms.

Overall comparison

Here’s a comparison of all the above options, in terms of what they include. I means that particular requirement is included in the package itself, D means it can be specified as a dependency. So for example a wheel includes your code, and you can specify it depends on matplotlib or flask.

Requirement	Wheel	PEX	RPM/Deb	Conda	Executable	Docker
Your code	I	I	I	I/❌	I	I
Python deps	D	I	I/D	I/D	I	I
C libraries	I/❌	❌	I/D	I/D	❌	I
Python exe	❌	❌	D	D	I	I

We can also compare the installation mechanism for your code, i.e. whether there’s some infrastructure for downloading it automatically:

Packaging type	Installation mechanism
Wheel	`pip`/`poetry`/`pipenv`
PEX	None
RPM/DEB	`dnf` or `apt`
Conda	None if you use `environment.yml`, `conda` for dependencies
Executable	None
Docker	`docker image pull`

And finally, support for multiple applications on the same machine:

Packaging type	Multiple applications support mechanism
Wheel	virtualenv
PEX	Run different PEX in parallel
RPM/DEB	Install multiple packages
Conda	Conda environments
Executable	Run different exes in parallel
Docker	Isolated containers

Which one should you use? That depends on your particular situation.

The concise and action-oriented guide to Docker packaging for production

Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.

Take the fast path to learning best practices, by using the Python on Docker Production Handbook.

Free ebook: "Introduction to Dockerizing for Production"

Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.

Plus, you'll join over 8000 people getting weekly emails covering practical tools and techniques, from Docker packaging to Python best practices.