Elegantly activating a virtualenv in a Dockerfile
When you’re packaging your Python application in a Docker image, you’ll often use a virtualenv
.
For example, you might be doing a multi-stage build in order to get smaller images.
Since you’re using a virtualenv
, you need to activate it—but if you’re just getting started with Dockerfiles, the naive way doesn’t work.
And even if you do know how to do it, the usual method is repetitive and therefore error-prone.
There is a simpler way of activating a virtualenv, which I’ll demonstrate in this article. But first, we’ll go over some of the other, less elegant (or broken!) ways you might do it.
Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.
The method that doesn’t work
If you just blindly convert a shell script into a Dockerfile you will get something that looks right, but is actually broken:
FROM python:3.12-slim
RUN python3 -m venv /opt/venv
# This is wrong!
RUN . /opt/venv/bin/activate
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
# Run the application:
COPY myapp.py .
CMD ["python", "myapp.py"]
It’s broken for two different reasons:
- Every
RUN
line in the Dockerfile is a different process. Runningactivate
in a separateRUN
has no effect on futureRUN
calls; for all practical purposes it’s a no-op. - When you run the resulting Docker image it will run the
CMD
—which also isn’t going to be run inside the virtualenv, since it too is unaffected by theRUN
processes.
The repetitive method that mostly works
One solution is to explicitly use the path to the binaries in the virtualenv. In this case we only have two repetitions, but in more complex situations you’ll need to do it over and over again.
Besides the lack of readability, repetition is a source of error.
As you add more calls to Python programs, it’s easy to forget to add the magic /opt/venv/bin/
prefix.
It will (mostly) work though:
FROM python:3.12-slim
RUN python3 -m venv /opt/venv
# Install dependencies:
COPY requirements.txt .
RUN /opt/venv/bin/pip install -r requirements.txt
# Run the application:
COPY myapp.py .
CMD ["/opt/venv/bin/python", "myapp.py"]
The only caveat is that if any Python process launches a sub-process, that sub-process will not run in the virtualenv.
The repetitive method that totally works
You can fix that by actually activating the virtualenv separately for each RUN
as well as the CMD
:
FROM python:3.12-slim
RUN python3 -m venv /opt/venv
# Install dependencies:
COPY requirements.txt .
RUN . /opt/venv/bin/activate && pip install -r requirements.txt
# Run the application:
COPY myapp.py .
CMD . /opt/venv/bin/activate && exec python myapp.py
(The exec
is there to get correct signal handling.)
The elegant method, in which we learn what activating actually does
It’s easy to think of activate
as some mysterious magic, a pentacle drawn in blood to keep Python safely trapped.
But it’s just software, and fairly simple software at that.
The virtualenv documentation will even tell you that activate
is “purely a convenience.”
If you go and read the code for activate
, it does a number of things:
- It figures out what shell you’re running.
- It adds a
deactivate
function to your shell, and messes around withpydoc
. - It changes the shell prompt to include the virtualenv name.
- It unsets the
PYTHONHOME
environment variable, if someone happened to set it. - It sets two environment variables:
VIRTUAL_ENV
andPATH
.
The first four are basically irrelevant to Docker usage, so that just leaves the last item.
Most of the time VIRTUAL_ENV
has no effect, but some tools—e.g. the poetry
packaging tool—use it to detect whether you’re running inside a virtualenv.
The most important part is setting PATH
: PATH
is a list of directories which are searched for commands to run.
activate
simply adds the virtualenv’s bin/
directory to the start of the list.
We can replace activate
by setting the appropriate environment variables: Docker’s ENV
command applies both subsequent RUN
s as well as to the CMD
.
The result is the following Dockerfile:
FROM python:3.12-slim
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
# Run the application:
COPY myapp.py .
CMD ["python", "myapp.py"]
The virtualenv now automatically works for both RUN
and CMD
, without any repetition or need to remember anything.
Software isn’t magic
And there you have it: a version that is as simple as our original, broken version, but actually does the right thing. No repetition, and less scope for error.
When something seems needlessly complex, dig in and figures out how it works. The software you’re using might be simpler (or more simplistic) than you think, and with a little work you might come up with a more elegant solution.