The best Docker base image for your Python application (July 2019)
When you’re building a Docker image for your Python application, you’re building on top of an existing image—and there are many possible choices.
There are OS images like Ubuntu and CentOS, and there are the many different variants of the
python base image.
Which one should you use? Which one is better? There are many choices, and it may not be obvious which is the best for your situation.
So to help you make a choice that fits your needs, in this article I’ll go through some of the relevant criteria, and suggest some reasonable defaults that will work for most people.
What do you want from a base image?
There are a number of common criteria for choosing a base image, though your particular situation might emphasize, add, or remove some of these:
- Stability: You want a build today to give you the same basic set of libraries, directory structure, and infrastructure as a build tomorrow, otherwise your application will randomly break.
- Security updates: You want the base image to be well-maintained, so that you get security updates for the base operating system in a timely manner.
- Up-to-date dependencies: Unless you’re building a very simple application, you will likely depend on operating system-installed libraries and applications (e.g. a compiler). You’d like them not to be too old.
- Extensive dependencies: For some applications less popular dependencies may be required—a base image with access to a large number of libraries makes this easier.
- Up-to-date Python: While this can be worked around by installing Python yourself, having an up-to-date Python available saves you some effort.
- Small images: All things being equal, it’s better to have a smaller Docker image than a bigger Docker image.
The need for stability suggests not using operating systems with limited support lifetime, like Fedora or non-LTS Ubuntu releases.
Why you shouldn’t use Alpine Linux
A common suggestion for people who want small images is to use Alpine Linux, but using it has some costs. For one thing, Alpine has much fewer libraries than the other Linux distributions I mention above, so you might suffer from lack of a libraries.
There’s also a major difference between Alpine and other Linux distributions: Alpine uses a different C library, musl, instead of the more common glibc.
As a result, binary wheels won’t work on Alpine Linux, so many packages that Just Work on other Linux distributions will need to be compiled from scratch. This can mean long build times the first time you install a package.
In addition, while in theory musl and glibc are mostly compatible, in practice the differences can cause problems, and when problems do occur they are going to be strange and unexpected.
- Alpine has a smaller default stack size for threads, which can lead to Python crashes.
- One Alpine user discovered that their Python application was much slower because of the way musl allocates memory vs. glibc.
- I once couldn’t do DNS lookups in Alpine images running on minikube (Kubernetes in a VM) when using the WeWork coworking space’s WiFi. The cause was a combination of a bad DNS setup by WeWork, the way Kubernetes and minikube do DNS, and musl’s handling of this edge case vs. what glibc does. musl wasn’t wrong (it matched the RFC), but I had to waste time figuring out the problem and then switching to a glibc-based image.
- Another user discovered issues with time formatting and parsing.
Most of these problems have already been fixed, but no doubt there are more problems to discover. Random breakage of this sort is just one more thing to worry about—and the corresponding benefit of a slightly smaller image isn’t worth your time.
Because of both the wheels issue and the potential for incompatibility, I recommend against using Alpine.
Option #1: Ubuntu LTS, CentOS, Debian
There are three major operating systems that roughly meet the above criteria (dates and release versions are accurate at time of writing; the passage of time may require slightly different choices).
- Ubuntu 18.04 (the
ubuntu:18.04image) was released in April 2018, and since it’s a Long Term Support release it will get security updates until 2023.
- CentOS 7.6 (
centos:7.6.1810) was released in October 2018, and will have full updates until Q4 2020 and maintenance updates until 2024. CentOS 8 is currently being worked on, based on RHEL 8 which was released May 2019.
- Debian 10 (“Buster”) was released on July 6th 2019, and will be supported until 2024.
Other than Buster, which includes Python 3.7, if you want the latest version of Python you’ll have to install it yourself. And once 3.8 is out Buster will have the same problem.
Option #2: The Python Docker image
Another alternative is Docker’s own “official”
python image, which comes pre-installed with multiple versions of Python (
3.8 beta, etc.), and has multiple variants:
- Alpine Linux, which as I explained above I don’t recommend using.
- Debian Buster, with many common packages installed. The image itself is large, but the theory is that these packages are installed via common image layers that other official Docker images will use, so overall disk usage will be low.
- Debian Buster
slimvariant. This lacks the common packages’ layers, and so the image itself is much smaller, but if you use many other Docker images based off Buster the overall disk usage will be somewhat higher.
The size benefit for Alpine isn’t even particularly compelling: the download size of
python:3.7-slim-buster is 60MB, and
python:3.7-alpine is 34MB, and their uncompressed on-disk size is 180MB and 100MB respectively.
So what should you use?
So as of July 2019, Debian Buster is a good operating system base:
- It’s more up-to-date than
ubuntu:20.04will take the lead when it’s released in April 2020.
- It’s stable, and won’t have significant library changes.
- There’s less chances of weird production bugs than Alpine.
And the official Python Docker images based off of Debian Buster also give you the full range of Python releases.
The official Docker Python image in its slim variant—e.g.
python:3.7-slim-buster—is a good base image for most use cases. it’s 60MB when downloaded, 180MB when uncompressed to disk, it gives you the latest Python releases, and it’s got all the benefits of Debian Buster.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Build production-ready Docker images—fast!
You want fast builds, small and secure images, operational correctness. And doing it all yourself will take you a week or more of effort.
Want to ship with confidence—in just hours?
Learn the faster way to build production-ready Python containers.
You need to get your job done, so how do you find time to learn new skills?
There’s not always time to learn new tools and technologies at work—but you still need to keep your skills sharp. And with so many tools and technologies to learn, you’re not even sure where to start.
Learn relevant, practical tools and techniques, quickly and efficiently, by signing up for my newsletter.
You’ll join over 1000 Python developers and data scientists getting weekly emails about software engineering best practices, from Docker packaging, to faster code, to better testing.