Broken by default: why you should avoid most Dockerfile examples
When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.
Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so. To demonstrate just some of the ways they’re broken, I’m going to:
- Start with an example
Dockerfilethat comes up fairly high on some Google searches.
- Show how it’s broken.
- Give some suggestions on how to make it less broken.
Note: Outside the very specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
To ensure you’re writing secure, correct, fast Dockerfiles, consider my Python on Docker Production Handbook, which includes a packaging process and >70 best practices.
Broken by default
Consider the following
Dockerfile, which I found by searching for Python Dockerization examples. I’ve made some minor changes to disguise its origin, but otherwise it is the same:
# DO NOT USE THIS DOCKERFILE AS AN EXAMPLE, IT IS BROKEN FROM python:3 COPY yourscript.py / RUN pip install flask CMD [ "python", "./yourscript.py" ]
Some of the problems with this Dockerfile
How many different problems can you spot in this image?
Problem #1: Non-reproducible builds re Python version
The first thing to notice is that this
Dockerfile is based off of the
At the time of writing this will install Python 3.7, but at some point it will switch to installing Python 3.8.
At that point rebuilding the image will switch to a different version of Python, which might break the software: a minor change in your code can lead to a deploy that breaks production.
python:3.7.3-stretch as the base image, to pin the version and OS. Or,
python:3.7-stretch if you’re feeling less worried about point releases. See my article for choosing a base image for Python for more details on image variants.
Problem #2: Non-reproducible builds re dependencies.
flask is installed with no versioning, so each time the image is rebuilt potentially a new version of
flask (or one of its dependencies, or one of its dependencies’ dependencies) will change.
If they’re compatible, great, but there’s no guarantee that is the case.
requirements.txt with transitively-pinned versions of all dependencies, e.g. by using
Problem #3: Changes to source code invalidate the build cache
If you want fast builds, you want to rely on Docker’s layer caching.
But by copying in the file before running
pip install, all later layers are invalidated—this image will be rebuilt from scratch every time.
Solution: Copy in files only when they’re first needed.
Problem #4: Running as root, which is insecure
By default Docker containers run as root, which is a security risk.
Solution: It’s much better to run as a non-root user, and do so in the image itself so that you don’t listen on ports<1024 or do other operations that require a subset of root’s permissions.
A somewhat better image
Here’s a somewhat better—though still not ideal—Dockerfile that addresses the issues above:
FROM python:3.7.3-stretch COPY requirements.txt /tmp/ RUN pip install -r /tmp/requirements.txt RUN useradd --create-home appuser WORKDIR /home/appuser USER appuser COPY yourscript.py . CMD [ "python", "./yourscript.py" ]
Even if the resulting image was something you’d want to run in production—and it almost certainly isn’t!—the image is still insufficient on its own.
For example, you also need to regularly update
requirements.txt in a controlled manner, in order to get security updates and bug fixes, and you’ll need to regularly rebuild your images without caching to get security updates.
And there are many more improvements you could make to get this closer to a production-ready Python container.
Be careful what you learn from
A broken Docker image can lead to production outages, and building best-practices images is a lot harder than it seems. So don’t just copy the first example you find on the web: do your research, and spend some time reading about best practices.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Production Docker packaging is too complicated to learn from Google searches
With as much as a dozen different intersecting technologies, and an unknown number of details to get right, Docker packaging isn't simple, especially for production.
But you still need fast builds that save you time, and security best practices that keep you safe.
Take the fast path to learning best practices, by using the Python on Docker Production Handbook.
Learn practical Python software engineering skills you can use at your job
Too much to learn? Don't know where to start?
Sign up for my newsletter, and join over 2600 Python developers and data scientists learning practical tools and techniques, from Docker packaging to testing to Python best practices, with a free new article in your inbox every week.