Broken by default: why you should avoid most Dockerfile examples
When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.
Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so. To demonstrate just some of the ways they’re broken, I’m going to:
- Start with an example
Dockerfilethat comes up fairly high on some Google searches.
- Show how it’s broken.
- Give some suggestions on how to make it less broken.
Broken by default
Consider the following
Dockerfile, which I found by searching for Python Dockerization examples. I’ve made some minor changes to disguise its origin, but otherwise it is the same:
# DO NOT USE THIS DOCKERFILE AS AN EXAMPLE, IT IS BROKEN
COPY yourscript.py /
RUN pip install flask
CMD [ "python", "./yourscript.py" ]
Some of the problems with this Dockerfile
How many different problems can you spot in this image?
Problem #1: Non-reproducible builds re Python version
The first thing to notice is that this
Dockerfile is based off of the
At the time of writing this will install Python 3.7, but at some point it will switch to installing Python 3.8.
At that point rebuilding the image will switch to a different version of Python, which might break the software: a minor change in your code can lead to a deploy that breaks production.
python:3.7.3-stretch as the base image, to pin the version and OS. Or,
python:3.7-stretch if you’re feeling less worried about point releases. See my article for choosing a base image for Python for more details on image variants.
Problem #2: Non-reproducible builds re dependencies.
flask is installed with no versioning, so each time the image is rebuilt potentially a new version of
flask (or one of its dependencies, or one of its dependencies’ dependencies) will change.
If they’re compatible, great, but there’s no guarantee that is the case.
requirements.txt with transitively-pinned versions of all dependencies, e.g. by using
Problem #3: Changes to source code invalidate the build cache
If you want fast builds, you want to rely on Docker’s layer caching.
But by copying in the file before running
pip install, all later layers are invalidated—this image will be rebuilt from scratch every time.
Solution: Copy in files only when they’re first needed.
Problem #4: Running as root, which is insecure
By default Docker containers run as root, which is a security risk.
Solution: It’s much better to run as a non-root user, and do so in the image itself so that you don’t listen on ports<1024 or do other operations that require a subset of root’s permissions.
A somewhat better image
Here’s a somewhat better—though still not ideal—Dockerfile that addresses the issues above:
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
RUN useradd --create-home appuser
COPY yourscript.py .
CMD [ "python", "./yourscript.py" ]
Even if the resulting image was something you’d want to run in production—and it almost certainly isn’t!—the image is still insufficient on its own.
For example, you also need to regularly update
requirements.txt in a controlled manner, in order to get security updates and bug fixes, and you’ll need to regularly rebuild your images without caching to get security updates.
Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.
Be careful what you learn from
A broken Docker image can lead to production outages, and building best-practices images is a lot harder than it seems. So don’t just copy the first example you find on the web: do your research, and spend some time reading about best practices.