Introduction to Dockerizing for Production

Improve your DevOps skills: learn an iterative process for Dockerizing your code.

Using Conda? You might not need Docker

by Itamar Turner-Trauring
Last updated 20 Sep 2022, originally created 20 Sep 2022

Docker packaging is useful, but doing it well is not easy. Even limiting the scope of discussion to production use of Python applications, the number of details to cover is extensive enough that I’ve written over 50 articles on the topic, and created a number of products to speed up the packaging process.

In a better universe, none of this would be necessary.

So while Docker is often useful enough to merit this effort, in some situations you might be better off avoiding Docker altogether. Specifically, Conda offers some of the benefits of Docker. And while Conda certainly has its own issues, using Conda on its own will involve less work than a combination of Conda on Docker.

This article will cover:

Development environments:
- The benefits of Docker packaging.
- The downsides.
- Why Conda might let you avoid the downsides and still have the same benefits.
Production, where Docker’s benefits might be replaced by a combination of framework functionality and Conda.

Docker for development environments

When you’re working on your software locally, during development, Docker has a number of benefits.

First, you might need to run additional services like PostgreSQL; Docker, and especially Docker Compose, make this a lot easier. Conda doesn’t really help much with this, so if this is a requirement Docker may well be the right solution.

Second, you might have developers using multiple operating systems, which raises some difficulties:

The runtime environment will differ for each developer based on their operating system. By running on Docker, you can run the application consistently on the same operating system, whatever variant of Linux your image is based on.
Install different tools consistently across multiple operating systems can be difficult. You likely need a number of utilities available: git, black, flake8, perhaps a compiler. On Linux you might use apt or dnf or other tools, on macOS you’ll want brew, and on Windows, good luck, none of the package repositories seem all that good. Docker can give you access to all these tools by putting them in a Linux image, so you only need to install them one way.

There are caveats, however: while Docker images do run consistently across different operating systems, they’re not exactly the same, and these differences become more significant in development environments.

When mounting a host directory as a volume, a common situation in development environments, on macOS and Windows any writes will be owned by the host’s user. On Linux, the created files will have a different UID, so you need to take steps to run the container as the host user’s UID, which can be tricky if the image wasn’t designed for this.
And if you want to use host.docker.internal to access processes running on the host, that won’t work in Linux out of the box.

Python and Conda: cross-platform logic, cross-platform dependencies

If you’re writing a Python project and you’re relying on Conda, and especially Conda-Forge, the benefits of Docker for a development environment become a little less compelling.

When it comes to the runtime environment, Python already creates a fairly cross-platform abstraction layer. Whether it’s opening files or networking, the Python APIs and libraries that build on them usually abstract away most of the platform-specific details.

That doesn’t help with command-line utilities (at least non-Python ones), compilers, and native libraries you might depend on… but this is where Conda comes in. Conda packages everything but the standard C library, from C libraries to the Python interpreter to command-line tools to compilers. And Conda-Forge has a huge variety of tools, libraries, and general packages available, pre-compiled for Windows, macOS, and Linux.

So instead of apt + brew + whatever you’re doing on Windows, you can have a single consistent packaging config: environment.yml, optionally pinned with conda-lock’s multi-OS lock files for extra consistency. It will list all the tools, libraries, compilers, Python dependencies, and so on that you need to run and develo your code, and it will install consistently across operating systems. No need for Docker!

name: yourapp
channels:
  - conda-forge
dependencies:
  - python=3.9
  - flake8
  - black
  - git
  - compilers

To summarize, for the use case of Python development environments, here’s how you might choose alternatives to Docker:

Docker use case	Alternative to Docker
Services like Postgres	Difficult, probably not worth doing
Consistent runtime	Python’s cross-platform abstractions
Cross-platform tool installation	Conda

Conda as a Docker alternative for production

Can Conda function as an alternative to Docker in production?

Given a production image will typically be running on Linux only, cross-platform consistency becomes less of an issue. But a Docker image combines two things we do want:

A consistent set of files that will run on Linux, including:
- All dependencies.
- The actual application code.
A particular command to run on startup.

As we discussed for development environments, getting consistent dependencies via Conda is often a reasonable alternative to Docker. The only thing that will depend on the host operating system is glibc, pretty much everything else will be packaged by Conda. So a pinned environment.yml or conda-lock.yml file is a reasonable alternative to a Docker image as far as having consistent dependencies.

That still leaves getting your application code, and running it.

In many cases the framework you are using to run your code can install Conda packages and distribute your code to whatever the runtime environment is, and will also control what gets run. This means you don’t really need a Docker image. For example:

Snakemake can run code using a Conda environment, or even generate a container image from a Conda environment. It also supports running Docker and Singularity images.
Metaflow supports installing Conda packages.
Mlflow supports both Conda and Docker-based projects.

All of these frameworks will let you set what command or code gets run, depending on their abstraction level.

Some limitations to avoiding Docker

First, if you’re not using a Docker image it’s even more important than usual to record the version of the code you’re using in the output, so you can reproduce all the necessary code later.

Second, there’s the bootstrapping problem: depending on the framework you’re using, you might need to install Conda and the framework driver before you can get anything going. A Docker image would come prepackaged with both, in addition to your code and its dependencies. So even if your framework supports Conda directly, you might want to use Docker anyway.

Third, depending on how smart the framework is, you might find yourself installing Conda packages over and over again on every run. This is inefficient, even when using a faster installer like Mamba.

You don’t need containers for everything

However useful they are, containers are just one solution out of many, and in some situations Conda can give you many of the benefits with less complexity. Every tool has its sweet spot, and almost every tool has alternatives, so it’s worth spending at least a little time considering if your default tool is the best one for the job.

Consulting services: take your code from prototype to production

You have a working Python prototype for your data processing algorithm. Now you need to get it ready for production. Which means your software needs to be fast, robust, maintainable, cost-efficient, and scalable.

With more than 25 years experience of shipping software to production, I can help you:

Speed up your code so it can get results on time, and run at scale with an affordable operating budget.

Learn about tools, techniques, and process improvements that will help you ship best-practices software, on schedule.

To get in touch about consulting services, send me an email at itamar@pythonspeed.com.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.