Reviewing the official Dockerfile best practices: good, bad, insecure

by Itamar Turner-Trauring
Last updated 22 Jan 2025, originally created 08 Jul 2019

Note: This article hasn’t been updated in many years, I’m just leaving it here for historical purposes.

If you’re building a Docker image, it’s natural to go read Docker’s official documentation’s best practices for Dockerfiles on docs.docker.com.

While this document has a lot of excellent and useful information, it also has some problems: important security and performance details that are left as an exercise to the reader, as well as contradictory advice.

To help you get faster builds and secure images, in this article I’ll go over some of the problems with the official best practices, along with references to solutions.

Problem #1: Caching can lead to insecure images

Good: The official best practices documentation encourages layer caching:

… you can minimize image layers by leveraging [the] build cache.

And layer caching is great: it allows for faster builds and in some cases for smaller images.

Insecure: However, it also means you won’t get security updates for system packages. So you also need to regularly rebuild the image from scratch.

Problem #2: Multi-stage builds can break caching

Good: The official documentation recommends using multi-stage caching:

Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files.

This is true, and for all but the simplest of builds you will benefit from using them.

Bad: However, in the most likely image build scenario, naively using multi-stage builds also breaks caching. That means you’ll have much slower builds.

The official documentation doesn’t cover this at all. So to understand why multi-stage builds break caching, and how to fix it, see my article on faster multi-stage builds.

Problem #3: Contradictory advice on running as root

Good: One part of the official documentation tells you that you shouldn’t run as root:

If a service can run without privileges, use USER to change to a non-root user.

This is excellent advice. Running as root exposes you to much larger security risks, e.g. a CVE in February 2019 that allowed escalating to root on the host was preventable by running as a non-root user.

Insecure: However, the official documentation also says:

… you should use the common, traditional port for your application. For example, an image containing the Apache web server would use EXPOSE 80.

In order to listen on port 80 you need to run as root. You don’t want to run as root, and given pretty much every Docker runtime can map ports, binding to ports <1024 is completely unnecessary: you can always map port 80 in your external port.

So don’t bind to ports <1024, and do run as a non-privileged user.

Problem #4: You probably don’t want to use Alpine Linux

Bad: The official best practices say:

We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution.

This is fine advice for Go, but bad advice for Python, leading to slower builds, larger images, and obscure bugs.

So in my recommendations for the best Docker base image for Python I recommend using the official images based on Debian Stable.

Packaging is a process that spans environments

The problems I discuss above are in part the result of a limited perspective on packaging: packaging isn’t just about one or two configuration files. Instead, packaging is a process that spans both time (ongoing releases and updates) and space (the different environments where the code is development, built, and run).

And the goal of this process is to reduce costs across these environments: both development bottlenecks (slow builds) and operational costs.

So you need to think about which costs really matter—I would argue the costs of obscure runtime bugs far outweighs the costs of a slightly larger image, for example.

And you also need to think about security updates, and about when and how to update dependencies, and about how your CI environment interacts with your build process.

Consulting services: take your code from prototype to production

You have a working Python prototype for your data processing algorithm. Now you need to get it ready for production. Which means your software needs to be fast, robust, maintainable, cost-efficient, and scalable.

With more than 25 years experience of shipping software to production, I can help you:

Speed up your code so it can get results on time, and run at scale with an affordable operating budget.

Learn about tools, techniques, and process improvements that will help you ship best-practices software, on schedule.

To get in touch about consulting services, send me an email at itamar@pythonspeed.com.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.