A review of the official Dockerfile best practices: good, bad, and insecure
If you’re building a Docker image, it’s natural to go read Docker’s official documentation’s best practices for
While this document has a lot of excellent and useful information, it also has some problems: important security and performance details that are left as an exercise to the reader, as well as contradictory advice.
To help you get faster builds and secure images, in this article I’ll go over some of the problems with the official best practices, along with references to solutions.
Problem #1: Caching can lead to insecure images
Good: The official best practices documentation encourages layer caching:
… you can minimize image layers by leveraging [the] build cache.
And layer caching is great: it allows for faster builds and in some cases for smaller images.
Insecure: However, it also means you won’t get security updates for system packages. So you also need to regularly rebuild the image from scratch.
Problem #2: Multi-stage builds can break caching
Good: The official documentation recommends using multi-stage caching:
Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files.
This is true, and for all but the simplest of builds you will benefit from using them.
Bad: However, in the most likely image build scenario, naively using multi-stage builds also break caching. That means you’ll have much slower builds.
The official documentation doesn’t cover this at all. So to understand why multi-stage builds break caching, and how to fix it, see my article on faster multi-stage builds.
Problem #3: Contradictory advice on running as root
Good: One part of the official documentation tells you that you shouldn’t run as root:
If a service can run without privileges, use
USERto change to a non-root user.
This is excellent advice. Running as root exposes you to much larger security risks, e.g. a CVE in February 2019 that allowed escalating to root on the host was preventable by running as a non-root user.
Insecure: However, the official documentation also says:
… you should use the common, traditional port for your application. For example, an image containing the Apache web server would use
In order to listen on port 80 you need to run as root. You don’t want to run as root, and given pretty much every Docker runtime can map ports, binding to ports <1024 is completely unnecessary: you can always map port 80 in your external port.
So don’t bind to ports <1024, and do run as a non-privileged user.
Problem #4: You probably don’t want to use Alpine Linux
Bad: The official best practices say:
We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution.
I feel this is bad advice. I’ve seen a whole slew of reported problems from people using Alpine Linux, because it builds on the musl libc, rather than the GNU libc (glibc) most Linux distributions use. These range from bugs in datetime formatting to crashes due to a smaller stack.
To be fair, many or most of these issues have been fixed, but it’s bad enough debugging one’s own bugs in production, why add the potential for more just to get a slightly smaller image?
So in my recommendations for the best Docker base image for Python I recommend using images based on the Ubuntu LTS or Debian Stable images.
Packaging is a process that spans environments
The problems I discuss above are in part the result of a limited perspective on packaging: packaging isn’t just about one or two configuration files. Instead, packaging is a process that spans both time (ongoing releases and updates) and space (the different environments where the code is development, built, and run).
And the goal of this process is to reduce costs across these environments: both development bottlenecks (slow builds) and operational costs.
So you need to think about which costs really matter—I would argue the costs of obscure runtime bugs far outweighs the costs of a slightly larger image.
And you also need to think about security updates, and about when and how to update dependencies, and about how your CI environment interacts with your build process.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Build production-ready Docker images—fast!
You want fast builds, small and secure images, operational correctness. And doing it all yourself will take you a week or more of effort.
Want to ship with confidence—in just hours?
Learn the faster way to build production-ready Python containers.
You need to get your job done, so how do you find time to learn new skills?
There’s not always time to learn new tools and technologies at work—but you still need to keep your skills sharp. And with so many tools and technologies to learn, you’re not even sure where to start.
Learn relevant, practical tools and techniques, quickly and efficiently, by signing up for my newsletter.
You’ll join over 1000 Python developers and data scientists getting weekly emails about software engineering best practices, from Docker packaging, to faster code, to better testing.