A deep dive into the official Docker image for Python

The official Python image for Docker is quite popular, and in fact I recommend one of its variations as a base image. But many people don’t quite understand what it does, which can lead to confusion and brokenness.

In this post I will therefore go over how it’s constructed, why it’s useful, how to use it correctly, as well as its limitations. In particular, I’ll be reading through the python:3.8-slim-buster variant, as of August 19, 2020, and explaining it as I go along.

Reading the Dockerfile

The base image

We start with the base image:

FROM debian:buster-slim

That is, the base image is Debian GNU/Linux 10, the current stable release of the Debian distribution, also known as Buster because Debian names all their releases after characters from Toy Story. In case you’re wondering, Buster is Andy’s pet dog.

So to begin with, this is a Linux distribution that guarantees stability over time, while providing bug fixes. The slim variant has less packages installed, so no compilers for example.

Environment variables

Next, some environment variables. The first makes sure /usr/local/bin is early in the $PATH:

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

Basically, the Python image works by installing Python into /usr/local, so this ensures the executables it installs are the default ones used.

Next, the locale is set:


# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

As far as I can tell modern Python 3 will default to UTF-8 even without this, so I’m not sure it’s necessary these days.

There’s also an environment variable that tells you the current Python version:

ENV PYTHON_VERSION 3.8.5

And an environment variable with a GPG key, used to verify the Python source code when it’s downloaded.

Runtime dependencies

In order to run, Python needs some additional packages:

RUN apt-get update && apt-get install -y --no-install-recommends \
		ca-certificates \
		netbase \
	&& rm -rf /var/lib/apt/lists/*

The first, ca-certificates, is the list of standard certificate authorities’s certificates, comparable to what your browser uses to validate https:// URLs. This allows Python, wget, and other tools to validate certificates provided by servers.

The second, netbase, installs a few files in /etc that are needed to map certain names to corresponding ports or protocols. For example, /etc/services maps service names like https to corresponding port numbers, in this case 443/tcp.

Installing Python

Next, a compiler toolchain is installed, Python source code is downloaded, Python is compiled, and then the unneeded Debian packages are uninstalled:

RUN set -ex \
	\
	&& savedAptMark="$(apt-mark showmanual)" \
	&& apt-get update && apt-get install -y --no-install-recommends \
		dpkg-dev \
		gcc \
		libbluetooth-dev \
		libbz2-dev \
		libc6-dev \
		libexpat1-dev \
		libffi-dev \
		libgdbm-dev \
		liblzma-dev \
		libncursesw5-dev \
		libreadline-dev \
		libsqlite3-dev \
		libssl-dev \
		make \
		tk-dev \
		uuid-dev \
		wget \
		xz-utils \
		zlib1g-dev \
# as of Stretch, "gpg" is no longer included by default
		$(command -v gpg > /dev/null || echo 'gnupg dirmngr') \
	\
	&& wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \
	&& wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \
	&& export GNUPGHOME="$(mktemp -d)" \
	&& gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \
	&& gpg --batch --verify python.tar.xz.asc python.tar.xz \
	&& { command -v gpgconf > /dev/null && gpgconf --kill all || :; } \
	&& rm -rf "$GNUPGHOME" python.tar.xz.asc \
	&& mkdir -p /usr/src/python \
	&& tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
	&& rm python.tar.xz \
	\
	&& cd /usr/src/python \
	&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
	&& ./configure \
		--build="$gnuArch" \
		--enable-loadable-sqlite-extensions \
		--enable-optimizations \
		--enable-option-checking=fatal \
		--enable-shared \
		--with-system-expat \
		--with-system-ffi \
		--without-ensurepip \
	&& make -j "$(nproc)" \
		LDFLAGS="-Wl,--strip-all" \
	&& make install \
	&& rm -rf /usr/src/python \
	\
	&& find /usr/local -depth \
		\( \
			\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
			-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name '*.a' \) \) \
			-o \( -type f -a -name 'wininst-*.exe' \) \
		\) -exec rm -rf '{}' + \
	\
	&& ldconfig \
	\
	&& apt-mark auto '.*' > /dev/null \
	&& apt-mark manual $savedAptMark \
	&& find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' \
		| awk '/=>/ { print $(NF-1) }' \
		| sort -u \
		| xargs -r dpkg-query --search \
		| cut -d: -f1 \
		| sort -u \
		| xargs -r apt-mark manual \
	&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
	&& rm -rf /var/lib/apt/lists/* \
	\
	&& python3 --version

There’s a lot in there, but the basic outcome is:

  1. Python is installed into /usr/local.
  2. All .pyc files are deleted.
  3. The packages—gcc and so on—needed to compile Python are removed once they are no longer needed.

Because this all happens in a single RUN command, the image does not end up storing the compiler in any of its layers, keeping it smaller.

One thing you might notice is that Python requires libbluetooth-dev to compile. I found this surprising, so I asked, and apparently Python can create Bluetooth sockets, but only if compiled with this package installed.

Setting up aliases

Next, /usr/local/bin/python3 gets an alias /usr/local/bin/python, so you can call it either way:

# make some useful symlinks that are expected to exist
RUN cd /usr/local/bin \
	&& ln -s idle3 idle \
	&& ln -s pydoc3 pydoc \
	&& ln -s python3 python \
	&& ln -s python3-config python-config

Installing pip

The pip package download tool has its own release schedule, distinct from Python’s. For example, this Dockerfile is installing Python 3.8.5, released in July 2020. pip 20.2.2 was released in August, after that, but the Dockerfile makes sure to include that newer pip:

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
ENV PYTHON_PIP_VERSION 20.2.2
# https://github.com/pypa/get-pip
ENV PYTHON_GET_PIP_URL https://github.com/pypa/get-pip/raw/5578af97f8b2b466f4cdbebe18a3ba2d48ad1434/get-pip.py
ENV PYTHON_GET_PIP_SHA256 d4d62a0850fe0c2e6325b2cc20d818c580563de5a2038f917e3cb0e25280b4d1

RUN set -ex; \
	\
	savedAptMark="$(apt-mark showmanual)"; \
	apt-get update; \
	apt-get install -y --no-install-recommends wget; \
	\
	wget -O get-pip.py "$PYTHON_GET_PIP_URL"; \
	echo "$PYTHON_GET_PIP_SHA256 *get-pip.py" | sha256sum --check --strict -; \
	\
	apt-mark auto '.*' > /dev/null; \
	[ -z "$savedAptMark" ] || apt-mark manual $savedAptMark; \
	apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
	rm -rf /var/lib/apt/lists/*; \
	\
	python get-pip.py \
		--disable-pip-version-check \
		--no-cache-dir \
		"pip==$PYTHON_PIP_VERSION" \
	; \
	pip --version; \
	\
	find /usr/local -depth \
		\( \
			\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
			-o \
			\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
		\) -exec rm -rf '{}' +; \
	rm -f get-pip.py

Again, all .pyc files are deleted.

The entrypoint

Finally, the Dockerfile specifices the entrypoint:

CMD ["python3"]

By using CMD with an empty ENTRYPOINT, you get python by default when you run the image:

$ docker run -it python:3.8-slim-buster
Python 3.8.5 (default, Aug  4 2020, 16:24:08)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

But, you can also can specify other executables if you want:

$ docker run -it python:3.8-slim-buster bash
root@280c9b73e8f9:/# 

What have we learned?

Again, focusing specifically on the slim-buster variant, here are some takeaways.

The python official image includes Python

While this point may seem obvious, it’s worth noticing how it’s included: it’s a custom install in /usr/local.

A common mistake for people using this base image is to install Python again, by using Debian’s version of Python:

FROM python:3.8-slim-buster

# THIS IS NOT NECESSARY:
RUN apt-get update && apt-get install python3-dev

That installs an additional Python install in /usr, rather than /usr/local, and it will typically be a different version of Python. You probably don’t want two different versions of Python in the same image; mostly it just leads to confusion.

If you really want to use the Debian version of Python, use debian:buster-slim as the base image instead.

The python official image includes the latest pip

For example, the last release of Python 3.5 was in November 2019, but the Docker image for python:3.5-slim-buster includes pip from August 2020. This is (usually) a good thing, it means you get the latest bug fixes, performance improvements, and support for newer wheel variants.

The python official image deletes all .pyc files

If you want to speed up startup very slightly, you may wish to compile the standard library source code to .pyc in your own image with the compileall module.

The python official image does not install Debian security updates

While the base debian:buster-slim and python images do get regenerated often, there are windows where a new Debian security fix has been released, but the images have not been regenerated. You should install security updates to the base Linux distribution.


Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.