Installing system packages in Docker with minimal bloat

When you’re building a Docker image for your Python application, you will need to:

  1. Upgrade system packages in order to get the latest security updates and critical bug fixes.
  2. Sometimes, install additional system packages as dependencies for your Python libraries or application, for debugging, or to otherwise help build your image.

Unfortunately, the default options for system package installation with Debian, Ubuntu, and RedHat Enterprise Linux (RHEL) can result in much bigger images than you actually need.

So let’s see how you can install those security updates and dependencies—and still keep your image relatively small.

Why you shouldn’t just install some packages

Let’s see what happens if we just naively do security updates and install one extra package. Here’s our Dockerfile:

FROM python:3.8-slim-buster

# Download latest listing of available packages:
RUN apt-get -y update
# Upgrade already installed packages:
RUN apt-get -y upgrade
# Install a new package:
RUN apt-get -y install syslog-ng

Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.

Python on Docker Production Handbook Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.

We’ll build this image and check the size of the resulting image:

$ docker build -t python-with-syslog .
...
$ docker image ls --format "{{ .Size }}" python:3.8-slim-buster
193MB
$ docker image ls --format "{{ .Size }}" python-with-syslog
327MB

Just installing syslog-ng increased our image by 134MB—but why?

Installing less, and cleaning up

Installing packages adds unnecessary size by:

  1. Installing recommended packages that you may not actually need.
  2. Keeping around cached copies of the package index and downloaded packages, which you don’t need once the installation is done.

To prevent these problems you need to install only the packages you really need, and to clean up unnecessary files once installation is done.

Because Docker images are structured as a series of additive layers, cleanup needs to happen in the same RUN command that installed the packages. Otherwise, the deleted files will be gone in the latest layer, but not from the previous layer, much like deleting a file in your latest Git commit doesn’t delete it from previous commits.

Let’s see how we do that for the two packaging variants we’re considering here, Debian/Ubuntu and RHEL.

Debian, Ubuntu, and the Debian-based Python base image

The debian, ubuntu, and default python official base images all use the apt-get tool to install system packages. So the following will apply to all three.

Unlike before, when we had different RUN commands for each step, we’re going to have a single RUN command that runs a shell script called install-packages.sh:

FROM python:3.8-slim-buster

COPY install-packages.sh .
RUN ./install-packages.sh

Because it’s a single RUN, deleting files inside that script will ensure they never make it into any layer of the image, so they won’t waste any space. Here’s what the script looks like:

#!/bin/bash

# Bash "strict mode", to help catch problems and bugs in the shell
# script. Every bash script you write should include this. See
# http://redsymbol.net/articles/unofficial-bash-strict-mode/ for
# details.
set -euo pipefail

# Tell apt-get we're never going to be able to give manual
# feedback:
export DEBIAN_FRONTEND=noninteractive

# Update the package listing, so we know what package exist:
apt-get update

# Install security updates:
apt-get -y upgrade

# Install a new package, without unnecessary recommended packages:
apt-get -y install --no-install-recommends syslog-ng

# Delete cached files we don't need anymore (note that if you're
# using official Docker images for Debian or Ubuntu, this happens
# automatically, you don't need to do it yourself):
apt-get clean
# Delete index files we don't need anymore:
rm -rf /var/lib/apt/lists/*

With these changes, the resulting image is much smaller:

$ docker build -t python-with-syslog-2 .
...
$ docker image ls --format "{{ .Size }}" python-with-syslog-2
238MB

Instead of adding 134MB as it did before, installing the package only took 45MB.

Red Hat Enterprise Linux

With RHEL or compatible Docker images you want to follow a similar procedure: install only the packages you specifically need, and then clean up.

Here’s an example Dockerfile:

FROM redhat/ubi8

COPY install-packages.sh .
RUN ./install-packages.sh

And the corresponding install-packages.sh:

#!/bin/bash

# Bash "strict mode", to help catch problems and bugs in the shell
# script. Every bash script you write should include this. See
# http://redsymbol.net/articles/unofficial-bash-strict-mode/ for
# details.
set -euo pipefail

# Install security updates, bug fixes and enhancements only.
# --nodocs skips documentationm, which we don't need production
# Docker images.
dnf --nodocs -y upgrade-minimal

# Install a new package, without unnecessary recommended packages:
dnf --nodocs -y install --setopt=install_weak_deps=False python3

# Delete cached files we don't need anymore:
dnf clean all

Even smaller images

Installing only necessary packages and cleaning up after the installer are good starting points, but you can get even smaller images. In particular, if you need to install a compiler, you can use multi-stage builds to ensure the compiler toolchain doesn’t end up in your final image.

And if you don’t want to implement these techniques yourself, they are all included in my Production-Ready Python Containers template.