Faster or slower: the basics of Docker build caching
Packaging can often be slow, and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up.
In order to speed up your builds, Docker implements caching: if your
Dockerfile and related files haven’t changed, a rebuild can reuse some of the existing layers in your local image cache.
But in order to take advantage of this cache, you need to understand how it works, and that’s what we’ll cover in this article.
The basic algorithm
When you build a
Dockerfile, Docker will see if it can use the cached results of previous builds:
- For most commands, if the text of the command hasn’t changed, the version from the cache will be used.
COPY, it also checks that the files you’re copying haven’t changed.
Let’s see an example using the following
FROM python:3.7-slim-buster COPY . . RUN pip install --quiet -r requirements.txt ENTRYPOINT ["python", "server.py"]
The first time we run it all the commands run:
$ docker build -t example1 . Sending build context to Docker daemon 5.12kB Step 1/4 : FROM python:3.7-slim-buster ---> f96c28b7013f Step 2/4 : COPY . . ---> eff791eb839d Step 3/4 : RUN pip install --quiet -r requirements.txt ---> Running in 591f97f47b6e Removing intermediate container 591f97f47b6e ---> 02c7cf5a3d9a Step 4/4 : ENTRYPOINT ["python", "server.py"] ---> Running in e3cf483c3381 Removing intermediate container e3cf483c3381 ---> 598b0340cc90 Successfully built 598b0340cc90 Successfully tagged example1:latest
The second time, however, because nothing has changed
docker build will use the image cache:
$ docker build -t example1 . Sending build context to Docker daemon 5.12kB Step 1/4 : FROM python:3.7-slim-buster ---> f96c28b7013f Step 2/4 : COPY . . ---> Using cache ---> eff791eb839d Step 3/4 : RUN pip install --quiet -r requirements.txt ---> Using cache ---> 02c7cf5a3d9a Step 4/4 : ENTRYPOINT ["python", "server.py"] ---> Using cache ---> 598b0340cc90 Successfully built 598b0340cc90 Successfully tagged example1:latest
Notice it mentions “Using cache”—the result is a much faster build.
It doesn’t have to download any packages from the network to get
pip install to work.
If we delete the image from the local cache, the subsequent build starts from scratch, since Docker can’t use layers that aren’t there:
$ docker image rm example1 Untagged: example1:latest Deleted: sha256:598b0340cc90967501c5c51862dc586ca69a01ca465f48232fc457d3ab122a73 Deleted: sha256:02c7cf5a3d9af1939b9f5286312b23898fd3ea12b7cb1d7a77251251740a806c Deleted: sha256:d9e9602d9c3fd7381a8e1de301dc4345be2eb2b8488b5fc3e190eaacbb2f9596 Deleted: sha256:eff791eb839d00cbf46d139d8595b23867bc580bb9164b90253d0b2d9fcca236 Deleted: sha256:53d34b2ead0a465d229a4260fee2a845fb8551856d4019cd2e608dfe0e039e77 $ docker build -t example1 . Sending build context to Docker daemon 5.12kB Step 1/4 : FROM python:3.7-slim-buster ---> f96c28b7013f Step 2/4 : COPY . . ---> 63c32b9b1af6 ...
Taking advantage of caching
There’s one more important rule to the caching algorithm:
- If the cache can’t be used for a particular layer, all subsequent layers won’t be loaded from the cache.
In the following example the C layer hasn’t changed between new and old
Nonetheless, it still can’t be loaded from the cache since the previous layer (
B_CHANGED) couldn’t be loaded from the cache:
Let’s consider what that means for the following
FROM python:3.7-slim-buster COPY requirements.txt . COPY server.py . RUN pip install --quiet -r requirements.txt ENTRYPOINT ["python", "server.py"]
If any of the files we
COPY in change, that invalidates all later layers: we’ll need to rerun
pip install, for example.
server.py has changed but
requirements.txt hasn’t, why should we have to redo the
After all, the
pip install only uses
What you want to do therefore is to copy only those files that you actually need to run the next step, so as to minimize the opportunity for cache invalidation. For example:
FROM python:3.7-slim-buster COPY requirements.txt . RUN pip install --quiet -r requirements.txt COPY server.py . ENTRYPOINT ["python", "server.py"]
server.py is only copied in after the
pip install, the layer created by
pip install can still be loaded from the cache so long as
requirements.txt hasn’t changed.
Want to quickly get up to speed on Docker packaging? This article is an excerpt from my book, Just Enough Docker Packaging.
Designing your Dockerfile for caching
If you want fast builds by reusing your previously cached builds, you’ll need to write your
- Only copy in the files you need for the next step, to minimize cache invalidation in the build process.
- Make sure not to invalidate the cache accidentally by having an command early in the
Dockerfilethat always changes, e.g. a
LABELthat contains the build timestamp.
Learn Docker packaging in one afternoon
You need to start packaging your Python application with Docker, and you keep hitting errors, from connection refused to OCI runtime complaints, because you don't really understand how it all works.
Spend an afternoon learning both the fundamental concepts and the practical debugging techniques you need: read my concise, practical book on Docker packaging.
Learn practical Python software engineering skills you can use at your job
Too much to learn? Don't know where to start?
Sign up for my newsletter, and join over 2600 Python developers and data scientists learning practical tools and techniques, from Docker packaging to testing to Python best practices, with a free new article in your inbox every week.