Elegantly activating a virtualenv in a Dockerfile

When you’re packaging your Python application in a Docker image, you’ll often use a virtualenv. For example, you might be doing a multi-stage build in order to get smaller images.

Since you’re using a virtualenv, you need to activate it—but if you’re just getting started with Dockerfiles, the naive way doesn’t work. And even if you do know how to do it, the usual method is repetitive and therefore error-prone.

There is a simpler way of activating a virtualenv, which I’ll demonstrate in this article. But first, we’ll go over some of the other, less elegant (or broken!) ways you might do it.

Note: Outside the specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.

Want a best-practices Dockerfile and build system? Check out my Production-Ready Python Containers product.

The method that doesn’t work

If you just blindly convert a shell script into a Dockerfile you will get something that looks right, but is actually broken:

FROM ubuntu:18.04
RUN apt-get update && apt-get install \
  -y --no-install-recommends python3 python3-virtualenv

RUN python3 -m virtualenv --python=/usr/bin/python3 /opt/venv

# This is wrong!
RUN . /opt/venv/bin/activate

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt

# Run the application:
COPY myapp.py .
CMD ["python", "myapp.py"]

It’s broken for two different reasons:

  1. Every RUN line in the Dockerfile is a different process. Running activate in a separate RUN has no effect on future RUN calls; for all practical purposes it’s a no-op.
  2. When you run the resulting Docker image it will run the CMD—which also isn’t going to be run inside the virtualenv, since it too is unaffected by the RUN processes.

The repetitive method that mostly works

One solution is to explicitly use the path to the binaries in the virtualenv. In this case we only have two repetitions, but in more complex situations you’ll need to do it over and over again.

Besides the lack of readability, repetition is a source of error. As you add more calls to Python programs, it’s easy to forget to add the magic /opt/venv/bin/ prefix.

It will (mostly) work though:

FROM ubuntu:18.04
RUN apt-get update && apt-get install \
  -y --no-install-recommends python3 python3-virtualenv

RUN python3 -m virtualenv --python=/usr/bin/python3 /opt/venv

# Install dependencies:
COPY requirements.txt .
RUN /opt/venv/bin/pip install -r requirements.txt

# Run the application:
COPY myapp.py .
CMD ["/opt/venv/bin/python", "myapp.py"]

The only caveat is that if any Python process launches a sub-process, that sub-process will not run in the virtualenv.

The repetitive method that totally works

You can fix that by actually activating the virtualenv separately for each RUN as well as the CMD:

FROM ubuntu:18.04
RUN apt-get update && apt-get install \
  -y --no-install-recommends python3 python3-virtualenv

RUN python3 -m virtualenv --python=/usr/bin/python3 /opt/venv

# Install dependencies:
COPY requirements.txt .
RUN . /opt/venv/bin/activate && pip install -r requirements.txt

# Run the application:
COPY myapp.py .
CMD . /opt/venv/bin/activate && exec python myapp.py

(The exec is there to get correct signal handling.)

The elegant method, in which we learn what activating actually does

It’s easy to think of activate as some mysterious magic, a pentacle drawn in blood to keep Python safely trapped. But it’s just software, and fairly simple software at that. The virtualenv documentation will even tell you that activate is “purely a convenience.”

If you go and read the code for activate, it does a number of things:

  1. It figures out what shell you’re running.
  2. It adds a deactivate function to your shell, and messes around with pydoc.
  3. It changes the shell prompt to include the virtualenv name.
  4. It unsets the PYTHONHOME environment variable, if someone happened to set it.
  5. It sets two environment variables: VIRTUAL_ENV and PATH.

The first four are basically irrelevant to Docker usage, so that just leaves the last item. Most of the time VIRTUAL_ENV has no effect, but some tools—e.g. the poetry packaging tool—use it to detect whether you’re running inside a virtualenv.

The most important part is setting PATH: PATH is a list of directories which are searched for commands to run. activate simply adds the virtualenv’s bin/ directory to the start of the list.

We can replace activate by setting the appropriate environment variables: Docker’s ENV command applies both subsequent RUNs as well as to the CMD.

The result is the following Dockerfile:

FROM ubuntu:18.04
RUN apt-get update && apt-get install \
  -y --no-install-recommends python3 python3-virtualenv

ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m virtualenv --python=/usr/bin/python3 $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt

# Run the application:
COPY myapp.py .
CMD ["python", "myapp.py"]

The virtualenv now automatically works for both RUN and CMD, without any repetition or need to remember anything.

Software isn’t magic

And there you have it: a version that is as simple as our original, broken version, but actually does the right thing. No repetition, and less scope for error.

When something seems needlessly complex, dig in and figures out how it works. The software you’re using might be simpler (or more simplistic) than you think, and with a little work you might come up with a more elegant solution.


Learn how to build fast, production-ready Docker images—read the rest of the Docker packaging guide for Python.