Shrink your Conda Docker images with conda-pack
If you’re building a Docker image that’s based on Conda, the resulting images can be huge. For example, later I will show how a simple image with just Python 3.8 and NumPy can be over 950MB!
Large images waste bandwidth, disk, time, and CPU: how do you make the image smaller?
In this article I’ll show one way to do it, by combining the
conda-pack tool with multi-stage builds.
In the example case of just Python and NumPy, the image shrinks to 330MB, almost two-thirds smaller.
If you’re not familiar with multi-stage builds, I recommend reading my introduction to multi-stage builds first.
The problem: a giant image
Let’s create a standard Conda Docker image with just a couple of dependencies.
name: example channels: - conda-forge dependencies: - python=3.8 - numpy
And here’s the
Dockerfile; we install Python 3.8 and NumPy, and when we run the image it imports NumPy to make sure everything is working.
conda run is not something you’re familiar with, you might want to read my article on activating Conda environments in Docker.)
FROM continuumio/miniconda3 COPY environment.yml . RUN conda env create -f environment.yml ENTRYPOINT ["conda", "run", "-n", "example", \ "python", "-c", \ "import numpy; print('success!')"]
Note: Outside the very specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
To ensure you’re following all the best practices you need to have a secure, correct, fast Dockerfiles, check out the Python on Docker Production Handbook.
The resulting image is 970MB, which is quite surprisingly large. Where is all the disk space being going?
- Conda caches downloaded packages by default.
- The base environment where the Conda toolchain is installed takes up a bunch of space; it has its own copy of Python, for example, in this case Python 3.7.
That second reason means the base image we used,
continuumio/miniconda3, is 430MB.
For comparison, the
python:3.8-slim-buster image is 115MB, and it already includes the version of Python we’d want to use.
Let’s get rid of Conda!
The first reason for extra size is fairly standard with package managers, and can typically be fixed by either configuration or a well-targeted
The second problem, however, is Conda-specific: the base Conda environment is necessary for installation of packages, but once we’re running the code it really doesn’t add much.
conda-pack, a tool that let’s you package a Conda environment into a standalone environment, with no need for the Conda toolchain.
Once we’ve packaged up our environment that way, we can copy it into a new image that only contains that self-contained environment.
Again, if you’re not familiar with multi-stage builds, I recommend reading my introduction to multi-stage builds first.
Here’s what our new
Dockerfile looks like:
# The build-stage image: FROM continuumio/miniconda3 AS build # Install the package as normal: COPY environment.yml . RUN conda env create -f environment.yml # Install conda-pack: RUN conda install -c conda-forge conda-pack # Use conda-pack to create a standalone enviornment # in /venv: RUN conda-pack -n example -o /tmp/env.tar && \ mkdir /venv && cd /venv && tar xf /tmp/env.tar && \ rm /tmp/env.tar # We've put venv in same path it'll be in final image, # so now fix up paths: RUN /venv/bin/conda-unpack # The runtime-stage image; we can use Debian as the # base image since the Conda env also includes Python # for us. FROM debian:buster AS runtime # Copy /venv from the previous stage: COPY --from=build /venv /venv # When image is run, run the code with the environment # activated: SHELL ["/bin/bash", "-c"] ENTRYPOINT source /venv/bin/activate && \ python -c "import numpy; print('success!')"
If we build the image, the resulting image is much smaller, and it still works just fine:
$ docker image build -t condapack . ... $ docker container run condapack success! $ docker image ls condapack REPOSITORY TAG IMAGE ID SIZE condapack latest 6e7906bd0634 330MB
Why does this work?
Conda is an interesting packaging system in that it includes everything you need to run your program, other than the standard C library.
So when we install the
python=3.8 package in the
environment.yml, that installs Python and all C libraries it needs.
When we use
conda-pack to package our Conda environment into an isolated environment that doesn’t need Conda, the result is a directory with programs that can be run on almost any Linux distribution.
So we can just copy that directory onto a plain old small Debian image, and get a self-contained running application.
If you’re using Conda in your Docker image,
conda-pack is an easy way to shrink your image.
However, make sure to read my article on fast multi-stage builds; naive usage of multi-stage builds results in very slow rebuilds in CI.
Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.
Production Docker packaging is too complicated to learn from Google searches
With as much as a dozen different intersecting technologies, and an unknown number of details to get right, Docker packaging isn't simple, especially for production.
But you still need fast builds that save you time, and security best practices that keep you safe.
Take the fast path to learning best practices, by using the Python on Docker Production Handbook.
Free ebook: Introduction to Dockerizing for Production
Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.
Plus, you'll join my newsletter and get weekly articles covering practical tools and techniques, from Docker packaging to Python best practices.