A review of the official Dockerfile best practices: good, bad, and insecure
If you’re building a Docker image, it’s natural to go read Docker’s official documentation’s best practices for
While this document has a lot of excellent and useful information, it also has some problems: important security and performance details that are left as an exercise to the reader, as well as contradictory advice.
To help you get faster builds and secure images, in this article I’ll go over some of the problems with the official best practices, along with references to solutions.
Problem #1: Caching can lead to insecure images
Good: The official best practices documentation encourages layer caching:
… you can minimize image layers by leveraging [the] build cache.
And layer caching is great: it allows for faster builds and in some cases for smaller images.
Insecure: However, it also means you won’t get security updates for system packages. So you also need to regularly rebuild the image from scratch.
Problem #2: Multi-stage builds can break caching
Good: The official documentation recommends using multi-stage caching:
Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files.
This is true, and for all but the simplest of builds you will benefit from using them.
Bad: However, in the most likely image build scenario, naively using multi-stage builds also break caching. That means you’ll have much slower builds.
The official documentation doesn’t cover this at all. So to understand why multi-stage builds break caching, and how to fix it, see my article on faster multi-stage builds.
Problem #3: Contradictory advice on running as root
Good: One part of the official documentation tells you that you shouldn’t run as root:
If a service can run without privileges, use
USERto change to a non-root user.
This is excellent advice. Running as root exposes you to much larger security risks, e.g. a CVE in February 2019 that allowed escalating to root on the host was preventable by running as a non-root user.
Insecure: However, the official documentation also says:
… you should use the common, traditional port for your application. For example, an image containing the Apache web server would use
In order to listen on port 80 you need to run as root. You don’t want to run as root, and given pretty much every Docker runtime can map ports, binding to ports <1024 is completely unnecessary: you can always map port 80 in your external port.
So don’t bind to ports <1024, and do run as a non-privileged user.
Problem #4: You probably don’t want to use Alpine Linux
Bad: The official best practices say:
We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution.
This is bad advice for Python, leading to slower builds, larger images, and obscure bugs.
So in my recommendations for the best Docker base image for Python I recommend using the official images based on Debian Stable.
Packaging is a process that spans environments
The problems I discuss above are in part the result of a limited perspective on packaging: packaging isn’t just about one or two configuration files. Instead, packaging is a process that spans both time (ongoing releases and updates) and space (the different environments where the code is development, built, and run).
And the goal of this process is to reduce costs across these environments: both development bottlenecks (slow builds) and operational costs.
So you need to think about which costs really matter—I would argue the costs of obscure runtime bugs far outweighs the costs of a slightly larger image, for example.
And you also need to think about security updates, and about when and how to update dependencies, and about how your CI environment interacts with your build process.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Docker packaging is complicated, and you can’t afford to screw up production
From fast builds that save you time, to security best practices that keep you safe, how can you quickly gain the expertise you need to package your Python application for production?
Take the fast path to learning best practices, by using the Python on Docker Production Quickstart.
Learn practical Python software engineering skills you can use at your job
Too much to learn? Don't know where to start?
Sign up for my newsletter, and join over 2400 Python developers and data scientists learning practical tools and techniques, from Docker packaging to testing to Python best practices, with a free new article in your inbox every week.