Speeding up Docker builds in CI with BuildKit
No one enjoys waiting, and waiting for your software to build and tests to run isn’t fun either—in fact, it’s quite expensive. And if you’re building your Docker image in a CI system like GitHub Actions with ephemeral runners—where a new environment gets spinned up for every build—by default your builds are going to be extra slow.
In particular, when you spin up a new VM with a new Docker instance, the cache is empty, so when you run the Docker build your image has to be built from scratch.
Luckily, Docker includes some features to allow you to warm up the cache, by pulling previous versions of the image. And the newer BuildKit build system improves this even further—but also requires some changes, otherwise caching will stop working. And as of Docker 23.0, BuildKit is enabled by default.
Let’s see how you can speed up your Docker builds in CI with classic Docker, and then the improvements (and pitfall) provided by BuildKit.
Why building Docker images in CI can be slow
When you rebuild an existing image, Docker can look in its local cache for existing layers and reuse those if nothing has changed. This allows for faster builds.
However, in many cases CI runs on a new virtual machine or environment on every run. For example, whenever you run a task in GitHub Actions by default you will be using a new virtual machine. A new virtual machine means a new Docker install, and a new Docker install has an empty cache.
An empty cache means your image will be rebuilt from scratch—and that’s slow.
Speeding up CI builds in classic Docker
In order to make it easier to test things, I’m going to spin up a Docker registry on my computer, equivalent to
hub.docker.com or a cloud image registry:
$ docker run -d -p 5000:5000 --name registry registry:2 fb90defa3e7543accbafc15eb94d6c090204f0002c884851804a38e7f8d3fed9
Dockerfile we’re going to be building; it’s set up so that if the code changes but
requirements.txt is the same, Docker will be able to use the cached layer with the installed dependencies:
FROM python:3.9-slim-buster COPY requirements.txt . RUN pip install --quiet -r requirements.txt COPY . . ENTRYPOINT ["python", "app.py"]
Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.
In order to simulate building in an ephemeral, newly created VM, we’re going to use the following script to clear the cache in between builds:
#!/bin/bash set -euo pipefail # Simulate a newly created virtual machine with empty cache. # The '|| true' allows the shell script to continue if that # fails because the image doesn't exist. docker image rm myapp || true docker image rm localhost:5000/myapp || true docker image prune -f docker buildx prune -f # clear buildkit cache
Here’s out first pass at a build script:
#!/bin/bash set -euo pipefail # Build the image: docker build -t myapp . # Push to registry: docker tag myapp localhost:5000/myapp docker push localhost:5000/myapp
Let’s run this a couple of times; if caching was working correctly this would run really quickly the second time:
$ time ./build.sh ... real 0m25.124s user 0m0.130s sys 0m0.093s $ ./clear-cache.sh ... $ time ./build.sh ... real 0m25.178s user 0m0.113s sys 0m0.098s
As expected, because we’re simulating a new VM with an empty cache, the second build is no faster.
Warming the cache
In order to speed up the builds, we need to “warm” the cache. In classic Docker we do this by:
pulling the image.
- Using the
--cache-fromflag to tell
docker buildto use the pulled image as a source of cached layers.
#!/bin/bash set -euo pipefail # Pull the image: docker pull localhost:5000/myapp # Build the image: docker build -t myapp --cache-from localhost:5000/myapp . # Push to registry: docker tag myapp localhost:5000/myapp docker push localhost:5000/myapp
Now if we run the script:
$ ./clear-cache.sh ... $ time ./build-caching.sh ... real 0m1.817s user 0m0.160s sys 0m0.121s
That’s a lot faster!
BuildKit: faster, but with pitfalls
Let’s consider how classic caching works: we need to retrieve the whole image. If for example our code has changed, we don’t actually need to download the layer where the code is installed, since we’re not going to reuse it.
Ideally we could just point Docker at the image registry as part of the build, and it would only download the layers it was actually going to reuse. Technically this was possible with classic Docker, but apparently it was buggy and unreliable.
With BuildKit, the new build system for Docker, this is a built-in feature: you can skip the
docker pull and just have the build pull the layers it needs.
There is a pitfall, though: by default BuildKit doesn’t include the information needed to reuse images for caching.
In order to do so, you have to add an extra flag,
--build-arg BUILDKIT_INLINE_CACHE=1, otherwise caching won’t work at all, whether or not you’ve
Here’s our new build script:
#!/bin/bash set -euo pipefail # Enable BuildKit (unnecessary in 23.0 and later): export DOCKER_BUILDKIT=1 # Build the image; no pull needed, just make sure # --cache-from has the full image name you would pull/push: docker build -t myapp \ --cache-from localhost:5000/myapp \ --build-arg BUILDKIT_INLINE_CACHE=1 \ . # Push to registry: docker tag myapp localhost:5000/myapp docker push localhost:5000/myapp
Let’s try it out:
$ ./clear-cache.sh ... $ time ./build-buildkit.sh ... real 0m23.617s user 0m0.095s sys 0m0.088s $ ./clear-cache.sh ... $ time ./build-buildkit.sh ... real 0m1.641s user 0m0.080s sys 0m0.051s
The first build doesn’t use any of the classic Docker caching so it takes the full amount of time. The second build is sped up—but we didn’t have to do an explicit pull!
In many cases that can speed up builds, as BuildKit can pull only the layers it needs. If you’re using multi-stage builds BuildKit will do some of the build in parallel, giving more opportunities for a speed-up.
If you’re building Docker images in CI, and each CI run starts with an empty cache, make sure you’re using these techniques to keep your cache warm. You’ll get faster builds, save a little money, and save a little CO₂ too.
The concise and action-oriented guide to Docker packaging for production
Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.
Take the fast path to learning best practices, by using the Python on Docker Production Handbook.
Free ebook: "Introduction to Dockerizing for Production"
Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.
Plus, you'll join over 6900 people getting weekly emails covering practical tools and techniques, from Docker packaging to Python best practices.