A deep dive into the “official” Docker image for Python
The “official” Python image for Docker is quite popular, and in fact I recommend one of its variations as a base image. It’s official because it’s the one distributed by Docker, and it’s “official” because it’s not distributed by the Python developers. While the image is quite useful, many people don’t quite understand what it does, which can lead to confusion and brokenness.
In this post I will therefore go over how it’s constructed, why it’s useful, how to use it correctly, as well as its limitations.
In particular, I’ll be reading through the python:3.8-slim-buster
variant, as of August 19, 2020, and explaining it as I go along.
Reading the Dockerfile
The base image
We start with the base image:
FROM debian:buster-slim
That is, the base image is Debian GNU/Linux 10, the stable release of the Debian distribution at the time I originally wrote this article, also known as Buster because Debian names all their releases after characters from Toy Story. In case you’re wondering, Buster is Andy’s pet dog.
So to begin with, this is a Linux distribution that guarantees stability over time, while providing bug fixes.
The slim
variant has less packages installed, so no compilers for example.
Environment variables
Next, some environment variables.
The first makes sure /usr/local/bin
is early in the $PATH
:
# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH
Basically, the Python image works by installing Python into /usr/local
, so this ensures the executables it installs are the default ones used.
Next, the locale is set:
# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8
As far as I can tell modern Python 3 will default to UTF-8 even without this, so I’m not sure it’s necessary these days.
There’s also an environment variable that tells you the current Python version:
ENV PYTHON_VERSION 3.8.5
And an environment variable with a GPG key, used to verify the Python source code when it’s downloaded.
Runtime dependencies
In order to run, Python needs some additional packages:
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
netbase \
&& rm -rf /var/lib/apt/lists/*
The first, ca-certificates
, is the list of standard certificate authorities’s certificates, comparable to what your browser uses to validate https://
URLs.
This allows Python, wget, and other tools to validate certificates provided by servers.
The second, netbase
, installs a few files in /etc
that are needed to map certain names to corresponding ports or protocols.
For example, /etc/services
maps service names like https
to corresponding port numbers, in this case 443/tcp
.
Installing Python
Next, a compiler toolchain is installed, Python source code is downloaded, Python is compiled, and then the unneeded Debian packages are uninstalled:
RUN set -ex \
\
&& savedAptMark="$(apt-mark showmanual)" \
&& apt-get update && apt-get install -y --no-install-recommends \
dpkg-dev \
gcc \
libbluetooth-dev \
libbz2-dev \
libc6-dev \
libexpat1-dev \
libffi-dev \
libgdbm-dev \
liblzma-dev \
libncursesw5-dev \
libreadline-dev \
libsqlite3-dev \
libssl-dev \
make \
tk-dev \
uuid-dev \
wget \
xz-utils \
zlib1g-dev \
# as of Stretch, "gpg" is no longer included by default
$(command -v gpg > /dev/null || echo 'gnupg dirmngr') \
\
&& wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \
&& wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \
&& gpg --batch --verify python.tar.xz.asc python.tar.xz \
&& { command -v gpgconf > /dev/null && gpgconf --kill all || :; } \
&& rm -rf "$GNUPGHOME" python.tar.xz.asc \
&& mkdir -p /usr/src/python \
&& tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
&& rm python.tar.xz \
\
&& cd /usr/src/python \
&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
&& ./configure \
--build="$gnuArch" \
--enable-loadable-sqlite-extensions \
--enable-optimizations \
--enable-option-checking=fatal \
--enable-shared \
--with-system-expat \
--with-system-ffi \
--without-ensurepip \
&& make -j "$(nproc)" \
LDFLAGS="-Wl,--strip-all" \
&& make install \
&& rm -rf /usr/src/python \
\
&& find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name '*.a' \) \) \
-o \( -type f -a -name 'wininst-*.exe' \) \
\) -exec rm -rf '{}' + \
\
&& ldconfig \
\
&& apt-mark auto '.*' > /dev/null \
&& apt-mark manual $savedAptMark \
&& find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' \
| awk '/=>/ { print $(NF-1) }' \
| sort -u \
| xargs -r dpkg-query --search \
| cut -d: -f1 \
| sort -u \
| xargs -r apt-mark manual \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& rm -rf /var/lib/apt/lists/* \
\
&& python3 --version
There’s a lot in there, but the basic outcome is:
- Python is installed into
/usr/local
. - All
.pyc
files are deleted. - The packages—
gcc
and so on—needed to compile Python are removed once they are no longer needed.
Because this all happens in a single RUN
command, the image does not end up storing the compiler in any of its layers, keeping it smaller.
One thing you might notice is that Python requires libbluetooth-dev
to compile.
I found this surprising, so I asked, and apparently Python can create Bluetooth sockets, but only if compiled with this package installed.
Setting up aliases
Next, /usr/local/bin/python3
gets an alias /usr/local/bin/python
, so you can call it either way:
# make some useful symlinks that are expected to exist
RUN cd /usr/local/bin \
&& ln -s idle3 idle \
&& ln -s pydoc3 pydoc \
&& ln -s python3 python \
&& ln -s python3-config python-config
Installing pip
The pip
package download tool has its own release schedule, distinct from Python’s.
For example, this Dockerfile
is installing Python 3.8.5, released in July 2020.
pip
20.2.2 was released in August, after that, but the Dockerfile
makes sure to include that newer pip
:
# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
ENV PYTHON_PIP_VERSION 20.2.2
# https://github.com/pypa/get-pip
ENV PYTHON_GET_PIP_URL https://github.com/pypa/get-pip/raw/5578af97f8b2b466f4cdbebe18a3ba2d48ad1434/get-pip.py
ENV PYTHON_GET_PIP_SHA256 d4d62a0850fe0c2e6325b2cc20d818c580563de5a2038f917e3cb0e25280b4d1
RUN set -ex; \
\
savedAptMark="$(apt-mark showmanual)"; \
apt-get update; \
apt-get install -y --no-install-recommends wget; \
\
wget -O get-pip.py "$PYTHON_GET_PIP_URL"; \
echo "$PYTHON_GET_PIP_SHA256 *get-pip.py" | sha256sum --check --strict -; \
\
apt-mark auto '.*' > /dev/null; \
[ -z "$savedAptMark" ] || apt-mark manual $savedAptMark; \
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
rm -rf /var/lib/apt/lists/*; \
\
python get-pip.py \
--disable-pip-version-check \
--no-cache-dir \
"pip==$PYTHON_PIP_VERSION" \
; \
pip --version; \
\
find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \
\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
\) -exec rm -rf '{}' +; \
rm -f get-pip.py
Again, all .pyc
files are deleted.
The entrypoint
Finally, the Dockerfile
specifices the entrypoint:
CMD ["python3"]
By using CMD
with an empty ENTRYPOINT
, you get python
by default when you run the image:
$ docker run -it python:3.8-slim-buster
Python 3.8.5 (default, Aug 4 2020, 16:24:08)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
But, you can also can specify other executables if you want:
$ docker run -it python:3.8-slim-buster bash
root@280c9b73e8f9:/#
What have we learned?
Again, focusing specifically on the slim-buster
variant, here are some takeaways.
The python
official image includes Python
While this point may seem obvious, it’s worth noticing how it’s included: it’s a custom install in /usr/local
.
A common mistake for people using this base image is to install Python again, by using Debian’s version of Python:
FROM python:3.8-slim-buster
# THIS IS NOT NECESSARY:
RUN apt-get update && apt-get install python3-dev
Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.
That installs an additional Python install in /usr
, rather than /usr/local
, and it will typically be a different version of Python.
You probably don’t want two different versions of Python in the same image; mostly it just leads to confusion.
If you really want to use the Debian version of Python, use the equivalent debian:buster-slim
as the base image instead, or (these days) the more up-to-date debian:bullseye-slim
.
The python
official image includes the latest pip
At the time of writing, the last release of Python 3.5 was in November 2019, but the Docker image for python:3.5-slim-buster
includes pip
from August 2020.
This is (usually) a good thing, it means you get the latest bug fixes, performance improvements, and support for newer wheel variants.
The python
official image deletes all .pyc
files
If you want to speed up startup very slightly, you may wish to compile the standard library source code to .pyc
in your own image with the compileall
module.
The python
official image does not install Debian security updates
While the base debian:buster-slim
and python
images do get regenerated often, there are windows where a new Debian security fix has been released, but the images have not been regenerated.
You should install security updates to the base Linux distribution.