Faster or slower: the basics of Docker build caching
Packaging can often be slow, and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up.
In order to speed up your builds, Docker implements caching: if your Dockerfile
and related files haven’t changed, a rebuild can reuse some of the existing layers in your local image cache.
But in order to take advantage of this cache, you need to understand how it works, and that’s what we’ll cover in this article.
The basic algorithm
When you build a Dockerfile
, Docker will see if it can use the cached results of previous builds:
- For most commands, if the text of the command hasn’t changed, the version from the cache will be used.
- For
COPY
, it also checks that the files you’re copying haven’t changed.
Let’s see an example using the following Dockerfile
:
FROM python:3.7-slim-buster
COPY . .
RUN pip install --quiet -r requirements.txt
ENTRYPOINT ["python", "server.py"]
The first time we run it all the commands run:
$ docker build -t example1 .
Sending build context to Docker daemon 5.12kB
Step 1/4 : FROM python:3.7-slim-buster
---> f96c28b7013f
Step 2/4 : COPY . .
---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
---> Running in 591f97f47b6e
Removing intermediate container 591f97f47b6e
---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
---> Running in e3cf483c3381
Removing intermediate container e3cf483c3381
---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest
The second time, however, because nothing has changed docker build
will use the image cache:
$ docker build -t example1 .
Sending build context to Docker daemon 5.12kB
Step 1/4 : FROM python:3.7-slim-buster
---> f96c28b7013f
Step 2/4 : COPY . .
---> Using cache
---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
---> Using cache
---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
---> Using cache
---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest
Notice it mentions “Using cache”—the result is a much faster build.
It doesn’t have to download any packages from the network to get pip install
to work.
If we delete the image from the local cache, the subsequent build starts from scratch, since Docker can’t use layers that aren’t there:
$ docker image rm example1
Untagged: example1:latest
Deleted: sha256:598b0340cc90967501c5c51862dc586ca69a01ca465f48232fc457d3ab122a73
Deleted: sha256:02c7cf5a3d9af1939b9f5286312b23898fd3ea12b7cb1d7a77251251740a806c
Deleted: sha256:d9e9602d9c3fd7381a8e1de301dc4345be2eb2b8488b5fc3e190eaacbb2f9596
Deleted: sha256:eff791eb839d00cbf46d139d8595b23867bc580bb9164b90253d0b2d9fcca236
Deleted: sha256:53d34b2ead0a465d229a4260fee2a845fb8551856d4019cd2e608dfe0e039e77
$ docker build -t example1 .
Sending build context to Docker daemon 5.12kB
Step 1/4 : FROM python:3.7-slim-buster
---> f96c28b7013f
Step 2/4 : COPY . .
---> 63c32b9b1af6
...
Taking advantage of caching
There’s one more important rule to the caching algorithm:
- If the cache can’t be used for a particular layer, all subsequent layers won’t be loaded from the cache.
In the following example the C layer hasn’t changed between new and old Dockerfile
s.
Nonetheless, it still can’t be loaded from the cache since the previous layer (B_CHANGED
) couldn’t be loaded from the cache:
Let’s consider what that means for the following Dockerfile
:
FROM python:3.7-slim-buster
COPY requirements.txt .
COPY server.py .
RUN pip install --quiet -r requirements.txt
ENTRYPOINT ["python", "server.py"]
If any of the files we COPY
in change, that invalidates all later layers: we’ll need to rerun pip install
, for example.
But if server.py
has changed but requirements.txt
hasn’t, why should we have to redo the pip install
?
After all, the pip install
only uses requirements.txt
.
What you want to do therefore is to copy only those files that you actually need to run the next step, so as to minimize the opportunity for cache invalidation. For example:
FROM python:3.7-slim-buster
COPY requirements.txt .
RUN pip install --quiet -r requirements.txt
COPY server.py .
ENTRYPOINT ["python", "server.py"]
Because server.py
is only copied in after the pip install
, the layer created by pip install
can still be loaded from the cache so long as requirements.txt
hasn’t changed.
Want to quickly get up to speed on Docker packaging? This article is an excerpt from my book Just Enough Docker Packaging, which will help you understand the fundamentals of Docker packaging in just one afternoon.
Designing your Dockerfile for caching
If you want fast builds by reusing your previously cached builds, you’ll need to write your Dockerfile
appropriately:
- Only copy in the files you need for the next step, to minimize cache invalidation in the build process.
- Make sure not to invalidate the cache accidentally by having an command early in the
Dockerfile
that always changes, e.g. aLABEL
that contains the build timestamp.