Speed up your Conda installs with Mamba
Conda installs can be very very very slow.
Every time you run
- It has to collect the package metadata.
- It has to solve the environment. … maybe you can take a coffee break here, or go work on a jigsaw puzzle to relax …
- It has to download packages.
- Eventually, finally, it will install the packages it downloaded.
By the time this is all done you’ve probably forgotten what it was you were trying to do in the first place. To be fair, Conda has gotten faster in the past few releases, but it’s still far from being fast.
Luckily, a new project called Mamba has set out to reimplement Conda functionality while running much faster. So let’s see:
- How much faster Mamba is.
- How to switch to Mamba.
- Using it in Docker to make image builds even faster.
Measuring Conda vs. Mamba install speed
Let’s compare installing the following
environment.yml with both Conda and Mamba:
name: myenv channels: - conda-forge dependencies: - python=3.9 - matplotlib - pandas - scipy
In order to make sure there’s no caching messing with the results, we’ll do this as part of a Docker build. First, a normal Conda install:
# Dockerfile.just-conda FROM continuumio/miniconda3 COPY environment.yml . RUN /bin/bash -c "time conda env create -f environment.yml"
Running this, I got the following output from the
time utility for running the
conda env create:
real 2m15.143s user 1m38.642s sys 0m6.421s
If you’re not familiar with the output of
time, basically that’s saying 2:15 minutes elapsed in clock time, and 1:45 minutes of computation.
To learn more see my article on CPU vs. clock time.
Next, we’ll install Mamba (
conda install -c conda-forge mamba) and time creating the environment using Mamba:
# Dockerfile.conda-then-mamba FROM continuumio/miniconda3 COPY environment.yml . RUN conda install -c conda-forge mamba RUN /bin/bash -c "time mamba env create -f environment.yml"
Here’s the result:
real 0m47.090s user 0m28.493s sys 0m4.692s
Mamba installs these packages in only a third of the time that Conda does. Much of that is due to less CPU usage, but even network downloads seem to be little faster; Mamba uses parallel downloads to speed them up.
Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Make sure your production software is packaged securely, efficiently, and quickly: Read the pragmatic, thorough, and concise Python on Docker Production Handbook.
Mamba is a re-implementation of the Conda package manager, designed to be:
- Backwards compatible, with the same command-line options.
- Eventually, add more features.
If you look back at the two
Dockerfiles above, you’ll notice that once Mamba was installed, all you had to was replace
mamba in the command-line.
That’s true in general, and you can use Mamba for all your other Conda environment interactions.
Mamba has been in development since March 2019, has had 1.5 million downloads since then, and at least in my testing of environment creation seems to work just fine.
- You can install Mamba into a specific Conda environment as we did above, with
conda install -c conda-forge mamba.
- If this is your development machine, you’ll want to do
conda install mamba -n base -c conda-forgeso it’s available in all environments.
- If you’re setting up a new development machine, and you’re primarily using Conda-Forge, there is an another option. Conda-Forge provides an alternative to the normal miniconda installer.
This alternative installer:
conda-forgeas the default channel, instead of Anaconda’s commercially supported default channel.
- Optionally, comes pre-packaged with Mamba.
Speeding up Docker builds a little bit more
Looking back at the
mamba above, it still has one caveat:
# Dockerfile.conda-then-mamba FROM continuumio/miniconda3 COPY environment.yml . RUN conda install -c conda-forge mamba # <-- STILL SLOW RUN /bin/bash -c "time mamba env create -f environment.yml"
We’re still using
conda to get Mamba installed, which means that one install is still going to slow us down.
What to do?
A base image with Mamba pre-installed
As mentioned above, Conda-Forge has an installer that comes with Mamba pre-installed… and they also provide Docker images.
Which means we can use the following
# Dockerfile.just-mamba FROM condaforge/mambaforge COPY environment.yml . RUN mamba env create -f environment.yml
No need to install
Let’s see the end-to-end speed of the two ways of using Mamba. I’m going to measure the full speed of the Docker build, not including the time it takes to download the base image; in practice they’re both about the same size so that probably wouldn’t affect the results either.
First, here’s is our first attempt, which first installs Mamba and then uses Mamba to create the environment:
$ time docker build -q --no-cache -f Dockerfile.conda-then-mamba . sha256:9dc3e8d04ccf58862aa172a944d8010569c31b196e047d644ee2341816026282 real 1m21.384s user 0m0.019s sys 0m0.014s
Second, here’s our second attempt, using the
$ time docker build -q --no-cache -f Dockerfile.just-mamba . sha256:5f9f44818d78b0db6f281007ad4a456c06b37245bb33b2f5c3a5af95b38a3f6c real 0m53.383s user 0m0.020s sys 0m0.013s
So it looks like using the Docker base image with Mamba pre-installed saves us about 25 seconds. This isn’t bad, but it isn’t quite as exciting as the speed-up from switching to Mamba in the first place:
- Unlike the speedup from switching, which will probably scale at least somewhat with number of packages, this is a fixed overhead: you’re just saving the one-time
RUN conda install -c conda-forge mamba.
- On your development computer, you can just have Mamba installed in the base environment (see above), so there isn’t really much savings.
- In the context of Docker builds, the fixed cost of
RUN conda install -c conda-forge mambawill go away after the first build if you’re using Docker layer caching, which you should be.
Whether developing on your machine or packaging with Docker, you should use Mamba to install your Conda packages. It’s a whole lot faster—with the same functionality.
Create production-ready Docker Conda images in just one hour
You're packaging your Conda-based Python application with Docker, and that means a whole new set of worries...
You've got huge 2GB images, rebuilding the image means waiting hours for Conda to resolve dependencies, and you're worried about the security of your image.
Learn the fastest, easiest way to get from the prototype Docker image you're running on your laptop to production-ready Docker images that are small, fast to build, and secure.
Free ebook: Introduction to Dockerizing for Production
Learn a step-by-step iterative DevOps packaging process in this free mini-ebook. You'll learn what to prioritize, the decisions you need to make, and the ongoing organizational processes you need to start.
Plus, you'll join my newsletter and get weekly articles covering practical tools and techniques, from Docker packaging to Python best practices.