Introduction to Dockerizing for Production

Improve your DevOps skills: learn an iterative process for Dockerizing your code.

Faster or slower: the basics of Docker build caching

by Itamar Turner-Trauring
Last updated 27 Oct 2021, originally created 16 Sep 2019

Packaging can often be slow, and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up.

In order to speed up your builds, Docker implements caching: if your Dockerfile and related files haven’t changed, a rebuild can reuse some of the existing layers in your local image cache.

But in order to take advantage of this cache, you need to understand how it works, and that’s what we’ll cover in this article.

The basic algorithm

When you build a Dockerfile, Docker will see if it can use the cached results of previous builds:

For most commands, if the text of the command hasn’t changed, the version from the cache will be used.
For COPY, it also checks that the files you’re copying haven’t changed.

Let’s see an example using the following Dockerfile:

FROM python:3.7-slim-buster

COPY . .

RUN pip install --quiet -r requirements.txt

ENTRYPOINT ["python", "server.py"]

The first time we run it all the commands run:

$ docker build -t example1 .
Sending build context to Docker daemon   5.12kB
Step 1/4 : FROM python:3.7-slim-buster
 ---> f96c28b7013f
Step 2/4 : COPY . .
 ---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
 ---> Running in 591f97f47b6e
Removing intermediate container 591f97f47b6e
 ---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
 ---> Running in e3cf483c3381
Removing intermediate container e3cf483c3381
 ---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest

The second time, however, because nothing has changed docker build will use the image cache:

$ docker build -t example1 .
Sending build context to Docker daemon   5.12kB
Step 1/4 : FROM python:3.7-slim-buster
 ---> f96c28b7013f
Step 2/4 : COPY . .
 ---> Using cache
 ---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
 ---> Using cache
 ---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
 ---> Using cache
 ---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest

Notice it mentions “Using cache”—the result is a much faster build. It doesn’t have to download any packages from the network to get pip install to work.

If we delete the image from the local cache, the subsequent build starts from scratch, since Docker can’t use layers that aren’t there:

$ docker image rm example1
Untagged: example1:latest
Deleted: sha256:598b0340cc90967501c5c51862dc586ca69a01ca465f48232fc457d3ab122a73
Deleted: sha256:02c7cf5a3d9af1939b9f5286312b23898fd3ea12b7cb1d7a77251251740a806c
Deleted: sha256:d9e9602d9c3fd7381a8e1de301dc4345be2eb2b8488b5fc3e190eaacbb2f9596
Deleted: sha256:eff791eb839d00cbf46d139d8595b23867bc580bb9164b90253d0b2d9fcca236
Deleted: sha256:53d34b2ead0a465d229a4260fee2a845fb8551856d4019cd2e608dfe0e039e77
$ docker build -t example1 .
Sending build context to Docker daemon   5.12kB
Step 1/4 : FROM python:3.7-slim-buster
 ---> f96c28b7013f
Step 2/4 : COPY . .
 ---> 63c32b9b1af6
...

Taking advantage of caching

There’s one more important rule to the caching algorithm:

If the cache can’t be used for a particular layer, all subsequent layers won’t be loaded from the cache.

In the following example the C layer hasn’t changed between new and old Dockerfiles. Nonetheless, it still can’t be loaded from the cache since the previous layer (B_CHANGED) couldn’t be loaded from the cache:

Old Dockerfile: A then B then C. New Dockerfile: A then different B then C. Should you use cache? For A, yes. For B, no, it's changed. Which also means C won't get taken from cache since it's later.

Let’s consider what that means for the following Dockerfile:

FROM python:3.7-slim-buster

COPY requirements.txt .
COPY server.py .

RUN pip install --quiet -r requirements.txt

ENTRYPOINT ["python", "server.py"]

If any of the files we COPY in change, that invalidates all later layers: we’ll need to rerun pip install, for example.

But if server.py has changed but requirements.txt hasn’t, why should we have to redo the pip install? After all, the pip install only uses requirements.txt.

What you want to do therefore is to copy only those files that you actually need to run the next step, so as to minimize the opportunity for cache invalidation. For example:

FROM python:3.7-slim-buster

COPY requirements.txt .

RUN pip install --quiet -r requirements.txt

COPY server.py .

ENTRYPOINT ["python", "server.py"]

Because server.py is only copied in after the pip install, the layer created by pip install can still be loaded from the cache so long as requirements.txt hasn’t changed.

Want to quickly get up to speed on Docker packaging? This article is an excerpt from my book Just Enough Docker Packaging, which will help you understand the fundamentals of Docker packaging in just one afternoon.

Designing your Dockerfile for caching

If you want fast builds by reusing your previously cached builds, you’ll need to write your Dockerfile appropriately:

Only copy in the files you need for the next step, to minimize cache invalidation in the build process.
Make sure not to invalidate the cache accidentally by having an command early in the Dockerfile that always changes, e.g. a LABEL that contains the build timestamp.

Learn Docker packaging in just one afternoon

You need to start packaging your Python application with Docker, and you keep hitting errors, from connection refused to OCI runtime complaints, because you don't really understand how it all works.

Spend an afternoon learning both the fundamental concepts and the practical debugging techniques you need: read my concise, practical book on Docker packaging.

Free ebook: "Introduction to Dockerizing for Production"

Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.

Plus, you'll join over 7600 people getting weekly emails covering practical tools and techniques, from Docker packaging to Python best practices.