Faster Docker builds with pipenv, poetry, or pip-tools
Docker builds can be slow, and waiting for a build to finish is probably not how you want to spend your time. So you want faster builds—and caching is a great way to get there.
If the files you’re relying on haven’t changed, the Docker build can reuse previously cached layers for this particular build. And so if you separate out installation of dependencies from installation of your code in your Dockerfile you’ll usually get faster builds: if just your code changes, you won’t have to wait for all dependencies to be installed when you rebuild the Docker image.
In this article I’ll cover:
- An example of how to get slow builds.
- Faster builds by installing requirements first, from a
- Managing your
pipenvin your Docker build.
poetryin your Docker build.
Note: Outside the very specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
To ensure you’re writing secure, correct, fast Dockerfiles, consider my Quickstart guide, which includes a packaging process and 60+ best practices.
How to get slow builds
A common way of listing your dependencies is to put them in
from setuptools import setup setup(name='exampleapp', packages=["exampleapp"], install_requires=["flask", "dateutil"])
Dockerfile might then look like this:
FROM python:3.7 COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
The problem with this setup is that every time you change the code, that invalidates the
COPY . /tmp/myapp layer in the Docker cache, as well as all subsequent lines in the
And so every time you rebuild the image, you will need to reinstall the dependencies and your code.
Faster builds with requirements.txt
If you separate out your dependencies into a separate file, traditionally named
requirements.txt, you can copy in only that file, and install it earlier.
That way dependency installation can be cached, and packages will need to be reinstalled only if
requirements.txt would look like this:
Dockerfile like this:
FROM python:3.7 COPY requirements.txt /tmp RUN pip install -r requirements.txt COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
Notice that initially we only copy
requirements.txt, so that changes to the code won’t invalidate the caching at this point.
This will give you faster builds if you have made sure to pull previous builds.
A new problem: reproducible builds
The scheme above has a problem: it always installs the latest version of the dependencies. So if you build the image on a computer that doesn’t have the cache populated, you might get a different set of installed dependencies than what would get built if you did have the Docker cache populated.
In short, your builds aren’t reproducible.
And that can lead to a variety of problems, e.g. random breakage when packages get upgraded without your knowledge, or hard-to-reproduce differences in behavior (“but it worked on my computer!”).
To solve this you want to keep two versions of your dependencies:
- The logical dependencies of your application, i.e. the packages you directly import. In our example,
- The pinned transitive dependencies. That is, all of
flask’s dependencies (and their dependencies, and so on), pinned to a particular version.
The logical dependencies can be used to regenerate the pinned dependencies on the demand. The pinned dependencies are what you use to get reproducible builds.
Reproducible builds using pip-tools
pip-tools is one easy way to do this.
You can store the logical dependencies in
Then run (with matching Python version and ideally operating system):
$ pip-compile requirements.in > requirements.txt
And the resulting
requirements.txt looks like this:
argparse==1.4.0 # via dateutils click==7.0 # via flask dateutils==0.6.6 flask==1.0.3 itsdangerous==1.1.0 # via flask jinja2==2.10.1 # via flask markupsafe==1.1.1 # via jinja2 python-dateutil==2.8.0 # via dateutils pytz==2019.1 # via dateutils six==1.12.0 # via python-dateutil werkzeug==0.15.4 # via flask
You check in
requirements.txt into version control.
Dockerfile requires no changes from the version we showed above.
Fast reproducible Docker builds with poetry <a name=”poetry></a>
poetry is another tool that lets you manage logical and pinned dependencies.
You can export the dependencies to a
requirements.txt file, and then your
Dockerfile doesn’t need to use
poetry at all:
$ poetry export -f requirements.txt -o requirements.txt
Alternatively, you can write a
Dockerfile that first installs only dependencies, and later installs the actual application code:
FROM python:3.8-slim RUN pip install poetry WORKDIR /tmp/myapp COPY pyproject.toml poetry.lock . RUN cd /tmp && poetry install --no-root # Copy in everything else: COPY . . RUN poetry install CMD flask run exampleapp:app
This has some caveats, though.
Fast reproducible Docker builds with pipenv
pipenv is another tool that allows you to maintain logical dependencies (in a
Pipfile) and pinned dependencies (in a
It also does a whole lot more, e.g. virtualenv management.
Much of what it does isn’t relevant to building Docker images, though, so the easy way to use it in your Docker build is to export a
You can do this outside your Docker build, and just commit the resulting file to version control and use the
$ pipenv lock --requirements > requirements.txt
The plus is that your
Dockerfile doesn’t need to know anything about
pipenv. This does require you to remember to regenerate
requirements.txt every time you update
Alternatively, you can do the export in the build itself:
FROM python:3.7 RUN pip install pipenv COPY Pipfile* /tmp RUN cd /tmp && pipenv lock --requirements > requirements.txt RUN pip install -r /tmp/requirements.txt COPY . /tmp/myapp RUN pip install /tmp/myapp CMD flask run exampleapp:app
Note that a better setup, omitted for clarity, would have you install
pipenv in such a way that its dependencies don’t impact your code, e.g. by using a virtualenv for your code. If you want a best-practices Dockerfile and build system, check out my Production-Ready Python Containers product.
To get fast, reproducible builds for your application:
- Separate dependencies from your
- Separate logical and pinned dependencies (using
pip-toolsis Hynek Schlawack’s recommendation as of 2018, but the new Poetry release might make it a more compelling alternative).
- Install dependencies separately and earlier in your
Dockerfileto ensure faster builds.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Docker packaging is complicated, and you can’t afford to screw up production
From fast builds that save you time, to security best practices that keep you safe, how can you quickly gain the expertise you need to package your Python application for production?
Take the fast path to learning best practices, by using the Python on Docker Production Quickstart.
Learn practical Python software engineering skills you can use at your job
Too much to learn? Don't know where to start?
Sign up for my newsletter, and join over 2400 Python developers and data scientists learning practical tools and techniques, from Docker packaging to testing to Python best practices, with a free new article in your inbox every week.