Introduction to Dockerizing for Production

Improve your DevOps skills: learn an iterative process for Dockerizing your code.

A tableau of crimes and misfortunes: the ever-useful `docker history`

by Itamar Turner-Trauring
Last updated 01 Oct 2021, originally created 31 Jul 2020

If you want to understand a Docker image, there is no more useful tool than the docker history command. Whether it’s telling you why your image is so large, or helping you understand how a base image was constructed, the history command will let you peer into the innards of any image, allowing you to see the good, the bad, and the ugly.

Let’s see what this command does, what it can teach us about the construction of Docker images, and some examples of why it’s so useful.

The construction of a Docker image

Consider the following Docker image:

$ docker image ls mysteryimage
REPOSITORY    TAG      IMAGE ID       SIZE
mysteryimage  latest   24e6dd67bf8a   165MB

Given an image, we might have some questions:

What does it do?
What will happen when I run it?
How was it created?

The docker image history command, or it’s older synonym docker history, can help answer all these questions.

$ docker image history mysteryimage 
IMAGE      CREATED  CREATED BY                          SIZE
24e6dd67   2 mins   #(nop)  ENTRYPOINT ["python" "exa…  0B  
59102aef   2 mins   #(nop) COPY file:cc6452cd5813b9d2…  0B  
9d84edf3   7 weeks  #(nop)  CMD ["python3"]             0B  
<missing>  7 weeks  set -ex;   savedAptMark="$(apt-ma…  8MB
<missing>  7 weeks  #(nop)  ENV PYTHON_GET_PIP_SHA256…  0B  
<missing>  7 weeks  #(nop)  ENV PYTHON_GET_PIP_URL=ht…  0B  
<missing>  7 weeks  #(nop)  ENV PYTHON_PIP_VERSION=20…  0B  
<missing>  7 weeks  cd /usr/local/bin  && ln -s idle3…  32B 
<missing>  7 weeks  set -ex   && savedAptMark="$(apt-…  80MB
<missing>  7 weeks  #(nop)  ENV PYTHON_VERSION=3.8.3    0B  
<missing>  7 weeks  #(nop)  ENV GPG_KEY=E3FF2839C048B…  0B  
<missing>  7 weeks  apt-get update && apt-get install…  7MB
<missing>  7 weeks  #(nop)  ENV LANG=C.UTF-8            0B  
<missing>  7 weeks  #(nop)  ENV PATH=/usr/local/bin:/…  0B  
<missing>  7 weeks  #(nop)  CMD ["bash"]                0B 
<missing>  7 weeks  #(nop) ADD file:4d35f6c8bbbe6801c…  69MB

Docker images are constructed in layers, each layer corresponding to a first approximation to a line in a Dockerfile. The history command shows these layers, and the commands used to create them.

So what we have here is more or less the equivalent of the Dockerfile that constructed the image. And we can use this to answer a number of questions.

What is this Docker image going to run?

To figure out what the image will run, we just need to find the topmost ENTRYPOINT or CMD. We can use the --no-trunc argument to show the full, untruncated commands:

$ docker image history mysteryimage --no-trunc | grep ENTRYPOINT
sha256:24e6dd67bf8a   4 minutes ago       /bin/sh -c #(nop)  ENTRYPOINT ["python" "example.py"]

What was in the base image?

We can see what went into constructing the base image: you can differentiate the base image from the current image by the creation time for each layer. The base image was apparently created 7 weeks ago.

You can also see the ID of the base image, in case you want to docker run it.

What commands made the image size larger?

Notice that the output above has a SIZE column, showing you the size of each layer.

That means you can tell which specific steps in the Dockerfile contributed the most to the image size. In this example, 80MB came from one particular step:

$ docker image history mysteryimage --no-trunc | grep 80MB
<missing>    7 weeks ago       /bin/sh -c set -ex   && savedAptMark="$(apt-mark showmanual)"  && apt-get update && apt-get install -y --no-install-recommends   dpkg-dev   gcc   libbluetooth-dev   libbz2-dev   libc6-dev   libexpat1-dev   libffi-dev   libgdbm-dev   liblzma-dev   libncursesw5-dev   libreadline-dev   libsqlite3-dev   libssl-dev   make   tk-dev   uuid-dev   wget   xz-utils   zlib1g-dev   $(command -v gpg > /dev/null || echo 'gnupg dirmngr')   && wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"  && wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc"  && export GNUPGHOME="$(mktemp -d)"  && gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY"  && gpg --batch --verify python.tar.xz.asc python.tar.xz  && { command -v gpgconf > /dev/null && gpgconf --kill all || :; }  && rm -rf "$GNUPGHOME" python.tar.xz.asc  && mkdir -p /usr/src/python  && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz  && rm python.tar.xz   && cd /usr/src/python  && gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"  && ./configure   --build="$gnuArch"   --enable-loadable-sqlite-extensions   --enable-optimizations   --enable-option-checking=fatal   --enable-shared   --with-system-expat   --with-system-ffi   --without-ensurepip  && make -j "$(nproc)"   LDFLAGS="-Wl,--strip-all"  && make install  && ldconfig   && apt-mark auto '.*' > /dev/null  && apt-mark manual $savedAptMark  && find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';'   | awk '/=>/ { print $(NF-1) }'   | sort -u   | xargs -r dpkg-query --search   | cut -d: -f1   | sort -u   | xargs -r apt-mark manual  && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false  && rm -rf /var/lib/apt/lists/*   && find /usr/local -depth   \(    \( -type d -a \( -name test -o -name tests -o -name idle_test \) \)    -o    \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \)   \) -exec rm -rf '{}' +  && rm -rf /usr/src/python   && python3 --version   80MB

Apparently this step compiles Python from source.

Extracting build arguments

The commands reported by docker image history are even more useful than the original Dockerfile insofar as they also include the values of build arguments in any subsequent RUN commands.

This can be useful for security auditing. For example, you might discover the image made the mistake of using the ARG command for build secrets, thus unintentionally leaking credentials:

$ docker pull itamarst/verysecure
...
$ docker image history itamarst/verysecure
IMAGE        CREATED BY
0b51ddadfcd  |1 ANOTHER_SECRET=oscillation-overthruster /…
<missing>    /bin/sh -c #(nop) WORKDIR /tmp
<missing>    /bin/sh -c #(nop)  ARG ANOTHER_SECRET
...

There are of course other, more secure ways to use build secrets in Docker.

Or you might discover the name of an internal server in a commercially-built Docker image, and the fact they’re still using FTP:

$ docker history --no-trunc image_name_elided | grep ftp
<missing>  4 weeks ago    |2 FTP_PATH=ftp://kits-ftp/kits/unreleased_ftp/PRODUCTS//PRODUCT-dockerubuntux64.tar.gz  ....

The primary use case: figuring out why your image is too large

While docker history is useful in understanding how images are built, and occasionally for getting a glimpse into an insecure setup, the thing it’s most useful for is figuring out why an image is too large.

The first thing you should do when you have an overly large image is use docker image history to see which layers are contributing the most to image size. Often that’ll be enough to tell you exactly what’s going on.

The concise and action-oriented guide to Docker packaging for production

Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.

Take the fast path to learning best practices, by using the Python on Docker Production Handbook.

Free ebook: "Introduction to Dockerizing for Production"

Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.

Plus, you'll join over 8000 people getting weekly emails covering practical tools and techniques, from Docker packaging to Python best practices.