Faster Docker builds with pipenv, poetry, or pip-tools

Docker builds can be slow, and waiting for a build to finish is probably not how you want to spend your time. So you want faster builds—and caching is a great way to get there.

If the files you’re relying on haven’t changed, the Docker build can reuse previously cached layers for this particular build. And so if you separate out installation of dependencies from installation of your code in your Dockerfile you’ll usually get faster builds: if just your code changes, you won’t have to wait for all dependencies to be installed when you rebuild the Docker image.

In this article I’ll cover:

  1. An example of how to get slow builds.
  2. Faster builds by installing requirements first, from a requirements.txt file.
  3. Managing your requirements.txt with pip-tools.
  4. Using pipenv in your Docker build.
  5. Using poetry in your Docker build.

Note: Outside the specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.

Want a best-practices Dockerfile and build system? Check out my Production-Ready Python Containers product.

How to get slow builds

A common way of listing your dependencies is to put them in setup.py‘s install_requires:

from setuptools import setup

from setuptools import setup
from Cython.Build import cythonize

setup(name='exampleapp',
      packages=["exampleapp"],
      install_requires=["flask", "dateutil"])

Your Dockerfile might then look like this:

FROM python:3.7
COPY . /tmp/myapp
RUN pip install /tmp/myapp
CMD flask run exampleapp:app

The problem with this setup is that every time you change the code, that invalidates the COPY . /tmp/myapp layer in the Docker cache, as well as all subsequent lines in the Dockerfile. And so every time you rebuild the image, you will need to reinstall the dependencies and your code.

Faster builds with requirements.txt

If you separate out your dependencies into a separate file, traditionally named requirements.txt, you can copy in only that file, and install it earlier. That way dependency installation can be cached, and packages will need to be reinstalled only if requirements.txt changes.

requirements.txt would look like this:

dateutil
flask

And the Dockerfile like this:

FROM python:3.7
COPY requirements.txt /tmp
RUN pip install -r requirements.txt
COPY . /tmp/myapp
RUN pip install /tmp/myapp
CMD flask run exampleapp:app

Notice that initially we only copy requirements.txt, so that changes to the code won’t invalidate the caching at this point. This will give you faster builds if you have made sure to pull previous builds.

A new problem: reproducible builds

The scheme above has a problem: it always installs the latest version of the dependencies. So if you build the image on a computer that doesn’t have the cache populated, you might get a different set of installed dependencies than what would get built if you did have the Docker cache populated.

In short, your builds aren’t reproducible.

And that can lead to a variety of problems, e.g. random breakage when packages get upgraded without your knowledge, or hard-to-reproduce differences in behavior (“but it worked on my computer!”).

To solve this you want to keep two versions of your dependencies:

  1. The logical dependencies of your application, i.e. the packages you directly import. In our example, flask and dateutil.
  2. The pinned transitive dependencies. That is, all of flask’s dependencies (and their dependencies, and so on), pinned to a particular version.

The logical dependencies can be used to regenerate the pinned dependencies on the demand. The pinned dependencies are what you use to get reproducible builds.

Reproducible builds using pip-tools

pip-tools is one easy way to do this. You can store the logical dependencies in requirements.in:

dateutil
flask

Then run (with matching Python version and ideally operating system):

$ pip-compile requirements.in > requirements.txt

And the resulting requirements.txt looks like this:

argparse==1.4.0           # via dateutils
click==7.0                # via flask
dateutils==0.6.6
flask==1.0.3
itsdangerous==1.1.0       # via flask
jinja2==2.10.1            # via flask
markupsafe==1.1.1         # via jinja2
python-dateutil==2.8.0    # via dateutils
pytz==2019.1              # via dateutils
six==1.12.0               # via python-dateutil
werkzeug==0.15.4          # via flask

You check in requirements.in and requirements.txt into version control. The Dockerfile requires no changes from the version we showed above.

Fast reproducible Docker builds with pipenv

pipenv is another tool that allows you to maintain logical dependencies (in a Pipfile) and pinned dependencies (in a Pipfile.lock). It also does a whole lot more, e.g. virtualenv management.

Much of what it does isn’t relevant to building Docker images, though, so the easy way to use it in your Docker build is to export a requirements.txt file. You can do this outside your Docker build, and just commit the resulting file to version control and use the Dockerfile above:

$ pipenv lock --requirements > requirements.txt

The plus is that your Dockerfile doesn’t need to know anything about pipenv. This does require you to remember to regenerate requirements.txt every time you update Pipfile.lock.

Alternatively, you can do the export in the build itself:

FROM python:3.7
RUN pip install pipenv
COPY Pipfile* /tmp
RUN cd /tmp && pipenv lock --requirements > requirements.txt
RUN pip install -r /tmp/requirements.txt
COPY . /tmp/myapp
RUN pip install /tmp/myapp
CMD flask run exampleapp:app

Note that a better setup, omitted for clarity, would have you install pipenv in such a way that its dependencies don’t impact your code, e.g. by using a virtualenv for your code. If you want a best-practices Dockerfile and build system, check out my Production-Ready Python Containers product.

Fast reproducible Docker builds with poetry

poetry is another tool that lets you manage logical and pinned dependencies. Unfortunately I have failed to figure out how to install dependencies separately from the code in the current released version. So you may be stuck with installing both code and dependencies at the same time if you’re using poetry.

Version 1.0 (as of June 2019 in alpha status) will have a poetry export command that lets you export a requirements.txt file. So one potential option is to install it with pip install --pre poetry and just use the pre-release for creating a requirements.txt file.

The takeaway

To get fast, reproducible builds for your application:

  1. Separate dependencies from your setup.py.
  2. Separate logical and pinned dependencies (using pip-tools, pipenv, or poetrypip-tools is Hynek Schlawack’s recommendation, and I concur).
  3. Install dependencies separately and earlier in your Dockerfile to ensure faster builds.

Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.