Docker can slow down your code and distort your benchmarks

One of the benefits of containers over virtual machines is that you get some measure of isolation without the performance overhead or distortion of virtualization. Docker images therefore seem like a good way to get a reproducible environment for measuring CPU performance of your code.

There are, however, complications. Sometimes, running under Docker can actually slow down your code and distort your performance measurements.

On macOS and Windows, for example, standard Linux-based Docker containers aren’t actually running directly on the OS, since the OS isn’t Linux. And the image filesystem from the container itself is typically mounted with some sort of overlay filesystem, which can slow things down, so for anything I/O bound you want to use a bind-mounted volume.

But even on Linux, with seeminly CPU-only workloads, Docker can distort runtime performance. Let’s see why, and some workarounds.

Slower in Docker… sometimes

The computer I’m testing on is running Fedora 33, and has Docker 20.10.6; I’ve disabled some operating system and CPU features that can make benchmarks less consistent (ASLR and turboboost). I’m going to compare running some code on my machine to code inside a container, and so for maximum realism I’m going to use the fedora:33 image.

First, let’s test a tiny Rust program that just does some floating point calculations:

$ ./benchmark
Elapsed: 921ms, result: 499999999067109000
$ docker run -v $PWD:/code fedora:33 /code/benchmark
Elapsed: 915ms, result: 499999999067109000

Some of the runs were slower; I picked the fastest ones. As a first approximation it seems like the performance was the same in and out of Docker.

Next, let’s try a Python program that again only does some computation. I’ve chosen the fastest runs:

$ python3.9 pystone.py 
Pystone(1.2) time for 50000 passes = 0.248776
This machine benchmarks at 200984 pystones/second
$ docker run -v $PWD:/code fedora:33 python3.9 /code/pystone.py
Pystone(1.2) time for 50000 passes = 0.297675
This machine benchmarks at 167968 pystones/second

In this case Python performance is about 16% slower when using Docker.

Even worse, we can see the performance hit is inconsistent: our tiny little Rust benchmark was unaffected by Docker, but the Python benchmark was slower. If the slowdown was always consistent, running everything in Docker would at least let us reliably measure relative performance, for example between two versions of some code. Inconsistent slowdowns mean Docker is distorting our results.

The cost of security

Now, containers don’t inherently have performance overhead: the whole point is that other than having different namespaces for things like networking or user IDs, a process in a container is just another process like any other.

So where’s the performance hit coming from? One plausible theory suggested by Aras Abbasi is that it’s Docker’s security features.

Docker originated in the world of platform-as-a-service, where applications from different users are running exposed to the world. So Docker also adds additional layers of security to prevent programs escaping from the container to the host.

  1. One of these security mechanisms is seccomp, which Docker uses to constrain what system calls containers can run.
  2. Older versions of seccomp have a performance problem that can slow down operations.
  3. Docker still hasn’t enabled this performance fix.

There may of course be other seccomp performance issues that are causing the problem, or one of the other security mechanisms that Docker uses, but we can at least test this general theory by running our Docker container in privileged mode. This disables all the security features, and so if those are responsible for the slowdown we should get our speed back:

$ docker run --privileged -v $PWD:/code fedora:33 python3.9 /code/pystone.py
Pystone(1.2) time for 50000 passes = 0.239254
This machine benchmarks at 208983 pystones/second

It worked! The code is no longer slower than the host. And yes, I’ve run both variants many times: performance always goes back to normal when running with --privileged.

Update: Further searching found this article that points the finger at security measures to prevent Spectre side-channel attacks. It therefore has some more fine-grained suggestions on how to fix this; this also implies this might be less of an issue on newer CPUs that have hardware fixes.

Benchmarking is hard

So should you run your benchmarks with --privileged, assumed you trust the code you’re running to run as root?

Maybe.

If you’re running your code on a containerized platform, the default Docker configuration might actually match reality better. Then again, it might not. For example, the Podman reimplementation of Docker doesn’t have this problem, and Kubernetes uses a different container runtime.

Your best bet: compare measurements across different environments, work out the differences, and aim for maximum realism for your particular situation.