Broken by default: why you should avoid most Dockerfile examples
When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.
Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so. To demonstrate just some of the ways they’re broken, I’m going to:
- Start with an example
Dockerfilethat comes up fairly high on some Google searches.
- Show how it’s broken.
- Give some suggestions on how to make it less broken.
Note: Outside the specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Want a best-practices Dockerfile and build system? Check out my Production-Ready Python Containers product.
Broken by default
Consider the following
Dockerfile, which I found by searching for Python Dockerization examples. I’ve made some minor changes to disguise its origin, but otherwise it is the same:
# DO NOT USE THIS DOCKERFILE AS AN EXAMPLE, IT IS BROKEN FROM python:3 COPY yourscript.py / RUN pip install flask CMD [ "python", "./yourscript.py" ]
Some of the problems with this Dockerfile
How many different problems can you spot in this image?
Problem #1: Non-reproducible builds re Python version
The first thing to notice is that this
Dockerfile is based off of the
At the time of writing this will install Python 3.7, but at some point it will switch to installing Python 3.8.
At that point rebuilding the image will switch to a different version of Python, which might break the software: a minor change in your code can lead to a deploy that breaks production.
python:3.7.3-stretch as the base image, to pin the version and OS. Or,
python:3.7-stretch if you’re feeling less worried about point releases. See my article for choosing a base image for Python for more details on image variants.
Problem #2: Non-reproducible builds re dependencies.
flask is installed with no versioning, so each time the image is rebuilt potentially a new version of
flask (or one of its dependencies, or one of its dependencies’ dependencies) will change.
If they’re compatible, great, but there’s no guarantee that is the case.
requirements.txt with transitively-pinned versions of all dependencies, e.g. by using
Problem #3: Changes to source code invalidate the build cache
If you want fast builds, you want to rely on Docker’s layer caching.
But by copying in the file before running
pip install, all later layers are invalidated—this image will be rebuilt from scratch every time.
Solution: Copy in files only when they’re first needed.
Problem #4: Running as root, which is insecure
By default Docker containers run as root, which is a security risk.
Solution: It’s much better to run as a non-root user, and do so in the image itself so that you don’t listen on ports<1024 or do other operations that require a subset of root’s permissions.
A somewhat better image
Here’s a somewhat better—though still not ideal—Dockerfile that addresses the issues above:
FROM python:3.7.3-stretch COPY requirements.txt /tmp/ RUN pip install -r /tmp/requirements.txt RUN useradd --create-home appuser WORKDIR /home/appuser USER appuser COPY yourscript.py . CMD [ "python", "./yourscript.py" ]
Even if the resulting image was something you’d want to run in production—and it almost certainly isn’t!—the image is still insufficient on its own.
For example, you also need to regularly update
requirements.txt in a controlled manner, in order to get security updates and bug fixes, and you’ll need to regularly rebuild your images without caching to get security updates.
And there are many more improvements you could make to get this closer to a production-ready Python container.
Be careful what you learn from
A broken Docker image can lead to production outages, and building best-practices images is a lot harder than it seems. So don’t just copy the first example you find on the web: do your research, and spend some time reading about best practices.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Build production-ready Docker images—fast!
You want fast builds, small and secure images, operational correctness. And doing it all yourself will take you a week or more of effort.
Want to ship with confidence—in just hours?
Learn the faster way to build production-ready Python containers.
You need to get your job done, so how do you find time to learn new skills?
There’s not always time to learn new tools and technologies at work—but you still need to keep your skills sharp. And with so many tools and technologies to learn, you’re not even sure where to start.
Learn relevant, practical tools and techniques, quickly and efficiently, by signing up for my newsletter.
You’ll join over 1000 Python developers and data scientists getting weekly emails about software engineering best practices, from Docker packaging, to faster code, to better testing.