Using Conda? You might not need Docker
Docker packaging is useful, but doing it well is not easy. Even limiting the scope of discussion to production use of Python applications, the number of details to cover is extensive enough that I’ve written over 50 articles on the topic, and created a number of products to speed up the packaging process.
In a better universe, none of this would be necessary.
So while Docker is often useful enough to merit this effort, in some situations you might be better off avoiding Docker altogether. Specifically, Conda offers some of the benefits of Docker. And while Conda certainly has its own issues, using Conda on its own will involve less work than a combination of Conda on Docker.
This article will cover:
- Development environments:
- The benefits of Docker packaging.
- The downsides.
- Why Conda might let you avoid the downsides and still have the same benefits.
- Production, where Docker’s benefits might be replaced by a combination of framework functionality and Conda.
Docker for development environments
When you’re working on your software locally, during development, Docker has a number of benefits.
First, you might need to run additional services like PostgreSQL; Docker, and especially Docker Compose, make this a lot easier. Conda doesn’t really help much with this, so if this is a requirement Docker may well be the right solution.
Second, you might have developers using multiple operating systems, which raises some difficulties:
- The runtime environment will differ for each developer based on their operating system. By running on Docker, you can run the application consistently on the same operating system, whatever variant of Linux your image is based on.
- Install different tools consistently across multiple operating systems can be difficult.
You likely need a number of utilities available:
flake8, perhaps a compiler. On Linux you might use
dnfor other tools, on macOS you’ll want
brew, and on Windows, good luck, none of the package repositories seem all that good. Docker can give you access to all these tools by putting them in a Linux image, so you only need to install them one way.
There are caveats, however: while Docker images do run consistently across different operating systems, they’re not exactly the same, and these differences become more significant in development environments.
- When mounting a host directory as a volume, a common situation in development environments, on macOS and Windows any writes will be owned by the host’s user. On Linux, the created files will have a different UID, so you need to take steps to run the container as the host user’s UID, which can be tricky if the image wasn’t designed for this.
- And if you want to use
host.docker.internalto access processes running on the host, that won’t work in Linux out of the box.
Python and Conda: cross-platform logic, cross-platform dependencies
If you’re writing a Python project and you’re relying on Conda, and especially Conda-Forge, the benefits of Docker for a development environment become a little less compelling.
When it comes to the runtime environment, Python already creates a fairly cross-platform abstraction layer. Whether it’s opening files or networking, the Python APIs and libraries that build on them usually abstract away most of the platform-specific details.
That doesn’t help with command-line utilities (at least non-Python ones), compilers, and native libraries you might depend on… but this is where Conda comes in. Conda packages everything but the standard C library, from C libraries to the Python interpreter to command-line tools to compilers. And Conda-Forge has a huge variety of tools, libraries, and general packages available, pre-compiled for Windows, macOS, and Linux.
So instead of
brew + whatever you’re doing on Windows, you can have a single consistent packaging config:
environment.yml, optionally pinned with
conda-lock’s multi-OS lock files for extra consistency.
It will list all the tools, libraries, compilers, Python dependencies, and so on that you need to run and develo your code, and it will install consistently across operating systems.
No need for Docker!
name: yourapp channels: - conda-forge dependencies: - python=3.9 - flake8 - black - git - compilers
To summarize, for the use case of Python development environments, here’s how you might choose alternatives to Docker:
|Docker use case||Alternative to Docker|
|Services like Postgres||Difficult, probably not worth doing|
|Consistent runtime||Python’s cross-platform abstractions|
|Cross-platform tool installation||Conda|
Conda as a Docker alternative for production
Can Conda function as an alternative to Docker in production?
Given a production image will typically be running on Linux only, cross-platform consistency becomes less of an issue. But a Docker image combines two things we do want:
- A consistent set of files that will run on Linux, including:
- All dependencies.
- The actual application code.
- A particular command to run on startup.
As we discussed for development environments, getting consistent dependencies via Conda is often a reasonable alternative to Docker.
The only thing that will depend on the host operating system is glibc, pretty much everything else will be packaged by Conda.
So a pinned
conda-lock.yml file is a reasonable alternative to a Docker image as far as having consistent dependencies.
That still leaves getting your application code, and running it.
In many cases the framework you are using to run your code can install Conda packages and distribute your code to whatever the runtime environment is, and will also control what gets run. This means you don’t really need a Docker image. For example:
- Snakemake can run code using a Conda environment, or even generate a container image from a Conda environment. It also supports running Docker and Singularity images.
- Metaflow supports installing Conda packages.
- Mlflow supports both Conda and Docker-based projects.
All of these frameworks will let you set what command or code gets run, depending on their abstraction level.
Some limitations to avoiding Docker
First, if you’re not using a Docker image it’s even more important than usual to record the version of the code you’re using in the output, so you can reproduce all the necessary code later.
Second, there’s the bootstrapping problem: depending on the framework you’re using, you might need to install Conda and the framework driver before you can get anything going. A Docker image would come prepackaged with both, in addition to your code and its dependencies. So even if your framework supports Conda directly, you might want to use Docker anyway.
Third, depending on how smart the framework is, you might find yourself installing Conda packages over and over again on every run. This is inefficient, even when using a faster installer like Mamba.
You don’t need containers for everything
However useful they are, containers are just one solution out of many, and in some situations Conda can give you many of the benefits with less complexity. Every tool has its sweet spot, and almost every tool has alternatives, so it’s worth spending at least a little time considering if your default tool is the best one for the job.
Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler
Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.
Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.
Learn practical Python software engineering skills you can use at your job
Sign up for my newsletter, and join over 7000 Python developers and data scientists learning practical tools and techniques, from Python performance to Docker packaging, with a free new article in your inbox every week.