A tableau of crimes and misfortunes: the ever-useful `docker history`
If you want to understand a Docker image, there is no more useful tool than the docker history
command.
Whether it’s telling you why your image is so large, or helping you understand how a base image was constructed, the history
command will let you peer into the innards of any image, allowing you to see the good, the bad, and the ugly.
Let’s see what this command does, what it can teach us about the construction of Docker images, and some examples of why it’s so useful.
The construction of a Docker image
Consider the following Docker image:
$ docker image ls mysteryimage
REPOSITORY TAG IMAGE ID SIZE
mysteryimage latest 24e6dd67bf8a 165MB
Given an image, we might have some questions:
- What does it do?
- What will happen when I run it?
- How was it created?
The docker image history
command, or it’s older synonym docker history
, can help answer all these questions.
$ docker image history mysteryimage
IMAGE CREATED CREATED BY SIZE
24e6dd67 2 mins #(nop) ENTRYPOINT ["python" "exa… 0B
59102aef 2 mins #(nop) COPY file:cc6452cd5813b9d2… 0B
9d84edf3 7 weeks #(nop) CMD ["python3"] 0B
<missing> 7 weeks set -ex; savedAptMark="$(apt-ma… 8MB
<missing> 7 weeks #(nop) ENV PYTHON_GET_PIP_SHA256… 0B
<missing> 7 weeks #(nop) ENV PYTHON_GET_PIP_URL=ht… 0B
<missing> 7 weeks #(nop) ENV PYTHON_PIP_VERSION=20… 0B
<missing> 7 weeks cd /usr/local/bin && ln -s idle3… 32B
<missing> 7 weeks set -ex && savedAptMark="$(apt-… 80MB
<missing> 7 weeks #(nop) ENV PYTHON_VERSION=3.8.3 0B
<missing> 7 weeks #(nop) ENV GPG_KEY=E3FF2839C048B… 0B
<missing> 7 weeks apt-get update && apt-get install… 7MB
<missing> 7 weeks #(nop) ENV LANG=C.UTF-8 0B
<missing> 7 weeks #(nop) ENV PATH=/usr/local/bin:/… 0B
<missing> 7 weeks #(nop) CMD ["bash"] 0B
<missing> 7 weeks #(nop) ADD file:4d35f6c8bbbe6801c… 69MB
Docker images are constructed in layers, each layer corresponding to a first approximation to a line in a Dockerfile
.
The history
command shows these layers, and the commands used to create them.
So what we have here is more or less the equivalent of the Dockerfile
that constructed the image.
And we can use this to answer a number of questions.
What is this Docker image going to run?
To figure out what the image will run, we just need to find the topmost ENTRYPOINT
or CMD
.
We can use the --no-trunc
argument to show the full, untruncated commands:
$ docker image history mysteryimage --no-trunc | grep ENTRYPOINT
sha256:24e6dd67bf8a 4 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["python" "example.py"]
What was in the base image?
We can see what went into constructing the base image: you can differentiate the base image from the current image by the creation time for each layer. The base image was apparently created 7 weeks ago.
You can also see the ID of the base image, in case you want to docker run
it.
What commands made the image size larger?
Notice that the output above has a SIZE column, showing you the size of each layer.
That means you can tell which specific steps in the Dockerfile
contributed the most to the image size.
In this example, 80MB came from one particular step:
$ docker image history mysteryimage --no-trunc | grep 80MB
<missing> 7 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-mark showmanual)" && apt-get update && apt-get install -y --no-install-recommends dpkg-dev gcc libbluetooth-dev libbz2-dev libc6-dev libexpat1-dev libffi-dev libgdbm-dev liblzma-dev libncursesw5-dev libreadline-dev libsqlite3-dev libssl-dev make tk-dev uuid-dev wget xz-utils zlib1g-dev $(command -v gpg > /dev/null || echo 'gnupg dirmngr') && wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" && wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" && export GNUPGHOME="$(mktemp -d)" && gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" && gpg --batch --verify python.tar.xz.asc python.tar.xz && { command -v gpgconf > /dev/null && gpgconf --kill all || :; } && rm -rf "$GNUPGHOME" python.tar.xz.asc && mkdir -p /usr/src/python && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz && rm python.tar.xz && cd /usr/src/python && gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" && ./configure --build="$gnuArch" --enable-loadable-sqlite-extensions --enable-optimizations --enable-option-checking=fatal --enable-shared --with-system-expat --with-system-ffi --without-ensurepip && make -j "$(nproc)" LDFLAGS="-Wl,--strip-all" && make install && ldconfig && apt-mark auto '.*' > /dev/null && apt-mark manual $savedAptMark && find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' | awk '/=>/ { print $(NF-1) }' | sort -u | xargs -r dpkg-query --search | cut -d: -f1 | sort -u | xargs -r apt-mark manual && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false && rm -rf /var/lib/apt/lists/* && find /usr/local -depth \( \( -type d -a \( -name test -o -name tests -o -name idle_test \) \) -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \) -exec rm -rf '{}' + && rm -rf /usr/src/python && python3 --version 80MB
Apparently this step compiles Python from source.
Extracting build arguments
The commands reported by docker image history
are even more useful than the original Dockerfile
insofar as they also include the values of build arguments in any subsequent RUN
commands.
This can be useful for security auditing.
For example, you might discover the image made the mistake of using the ARG
command for build secrets, thus unintentionally leaking credentials:
$ docker pull itamarst/verysecure
...
$ docker image history itamarst/verysecure
IMAGE CREATED BY
0b51ddadfcd |1 ANOTHER_SECRET=oscillation-overthruster /…
<missing> /bin/sh -c #(nop) WORKDIR /tmp
<missing> /bin/sh -c #(nop) ARG ANOTHER_SECRET
...
There are of course other, more secure ways to use build secrets in Docker.
Or you might discover the name of an internal server in a commercially-built Docker image, and the fact they’re still using FTP:
$ docker history --no-trunc image_name_elided | grep ftp
<missing> 4 weeks ago |2 FTP_PATH=ftp://kits-ftp/kits/unreleased_ftp/PRODUCTS//PRODUCT-dockerubuntux64.tar.gz ....
The primary use case: figuring out why your image is too large
While docker history
is useful in understanding how images are built, and occasionally for getting a glimpse into an insecure setup, the thing it’s most useful for is figuring out why an image is too large.
The first thing you should do when you have an overly large image is use docker image history
to see which layers are contributing the most to image size.
Often that’ll be enough to tell you exactly what’s going on.