Documentation: Docker template for Python applications

You can buy the actual template here

🎉 Here’s why you should use this template 🎉

Most Docker images are broken in a variety of ways. Most Docker image builds are slow. This template aims to fix that.

In particular:

  1. For efficiency and security, it creates a small runtime image with only the necessary files. This is done using multi-stage builds.
  2. Builds quickly using layer caching, which requires extra work when using multi-stage builds (see this article).
  3. The Dockerfile is designed to require as little editing as possible, so you can easily reuse the template across multiple projects.
  4. If you’re deploying a WSGI application like Flask or Django, it provides a pre-configured gunicorn setup that fixes common production problems (slow queries leading to failed health checks, and filesystem slowness breaking the internal heartbeat).
  5. Enables Python’s traceback-on-segfault faulthandler.
  6. Uses a reasonable base image, rather than the often recommended but actually often-broken Alpine image (see here).
  7. Pre-written configurations for build systems like GitHub Actions, GitLab CI, and CircleCI.
  8. Doesn’t run as root, for more secure images (see here).
  9. Correct signal handling so containers shut down quickly (see here).
  10. Support for requirements.txt, setup.py, pipenv, poetry, and flit as package installation mechanisms.
  11. Support for SSH public keys, in case you need to securely fetch code from the Internet using git; GitHub’s SSH key is included by default.

And much more!

Important: The template is proprietary software, licensed under the terms of the software license. The short version: you can’t redistribute it outside your organization, e.g. to customers or in open source projects.

😢 What to do if a feature is missing or you encounter a bug 😢

If you have any questions or problems please email me—but do please first read this document in detail to make sure it’s not already covered.

In addition, while I aim to support most common features, not everything will be supported out of the box in the template. Some options:

  1. Email me and ask about adding it; depending on the scope and my availability, I might add it myself, or you can hire me to do so.
  2. Remember that this is a template, not a tool. That means you can, and sometimes should, modify the code however you want to fit your particular needs.

🐋 Try it out: Build the example application 🐋

Before you start Dockerizing your application, you can try out Dockerizing the example application included in the template. In the template directory, run:

$ ./docker-shared/builder.py build exampleapp
$ docker run --rm -p 8000:8000 exampleapp

(You might need to sudo docker run on Linux.)

Note: If you get the following problem:

File "docker-shared/builder.py", line 2
SyntaxError: Non-ASCII character '\xe2' in file docker-shared/builder.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

It’s because you used Python 2; make sure you use Python 3.6 or later.

This should build the image and start it up on port 8000. You can send it some example queries, including queries that trigger a segfault or an exception. In another terminal, run:

$ curl http://localhost:8000/
$ curl http://localhost:8000/crash
$ curl http://localhost:8000/exception

Take a look at the output of docker run to see the nice traceback you get on the segfault.

Now it’s time to package your application!

The build process implemented in this template

You might think that a Dockerfile is sufficient to build a good Docker image, but that is not the case.

So this template includes not just a Dockerfile but also the necessary build infrastructure. At a high level, this template builds Docker images as follows:

  1. Build the image.
  2. Run a minimal smoke test to see if the image works. It’s called a smoke test because you’re turning it on to see if smoke comes out—if your image bursts into flames when you run it, you probably don’t want to push it!
  3. Push the images to your image registry.

There are actually two images built: one image used to compile and build your application, and a runtime image which is what you will run in production. This helps keep your runtime images small and more secure, since they don’t include extra—and unnecessary—things like compilers. See this article for details.

Usage, part I: Build an image for your application

These may seem like a lot of steps, but most of them should be pretty quick. Just go through the process and your image will be working soon.

The amount of configuration depends on how complex your setup is. For example if you’re happy with gunicorn, have no private packages, and you’re packaging a simple WSGI application, you might only need to modify a single file, docker-config/settings.py.

For your second, third, and so on applications you can skip this tutorial if you already know what to edit.

1. Copy over the files you need into your repository

This repository is a self-contained working example, so you won’t need all the files in it for your own application.

Besides the standard Dockerfile and .dockerignore, there are two main directories:

Let’s copy them in!

First, create a new branch in your version control.

Second, copy over into your repository:

  1. Dockerfile
  2. .dockerignore
  3. docker-shared/
  4. docker-config/

Third, commit the initial files to the branch.

By having an unmodified version committed to your repository, you’ll have an easier time seeing what specific customizations you have made, and reverting back to the baseline if necessary.

2. Basic settings: base image and packages to install

Files you might need to modify

Explanation

Initially you will want to configure a few things in docker-config/settings.py.

Base image and system packages:

Environment variables (including secrets) used as part of your build:

How your Python code will be installed and run:

Try it out

To ensure you haven’t broken anything, run:

$ ./docker-shared/builder.py validate_settings

This will run a quick check on your settings to make sure they’re not completely broken.

Important: The first time you run builder.py it will tweak your docker-shared/settings.py, adding a secrets_salt entry. Make sure to commit those changes to version control, or image rebuilds will be slow.

3. Customize the system package installation

Files you might need to modify

Explanation

By default the system-package installation will work with Debian, Ubuntu, RHEL 8 and CentOS 8.

You will only need to modify docker-config/install-system-packages.sh if:

Try it out

If you’ve made any changes to install-system-packages.sh, you can try it out by running:

$ ./docker-shared/builder.py build yourapp

It might fail later in the process, but see if it makes it past the step where install-system-packages.sh ran. If it did, you can move on to the next step.

If you’re getting errors, you can start a shell at a previous step of the build and try running commands manually to see what’s going on. As the build runs you will see hashes of the previous layers being reported:

Step 8/44 : COPY docker-config/ssh_known_hosts /root/.ssh/known_hosts
 ---> Using cache
 ---> 9909249f4427
Step 9/44 : RUN chmod 600 /root/.ssh/known_hosts
 ---> Using cache
 ---> b69e160e68ae
Step 10/44 : ARG PACKAGES_FOR_BUILD
 ---> Using cache
 ---> 17cb4944efd1
Step 11/44 : COPY docker-config/install-system-packages.sh /tmp
 ---> Using cache
 ---> baee951dc7a5
Step 12/44 : RUN WITH_BUILD_ENV /tmp/install-system-packages.sh ${PACKAGES_FOR_BUILD}
... something fails at this point ...

The step we’re debugging at this point is 12/44, so we want to look at the hash of the layer right before it, 11/44, in this case baee951dc7a5 (you will have a different hash probably):

$ docker run --rm -it --entrypoint=/bin/bash baee951dc7a5 
root@a39810f88086:/# ls /tmp/install-system-packages.sh 
/tmp/install-system-packages.sh
root@a39810f88086:/#

You can run /tmp/install-system-packages.sh yourself, or any commands it runs, and so on, as you debug the problem.

4. Customize the Python code installation

Before you begin

Installing Python code happens in two steps, to enable caching and therefore faster builds: first we install dependencies, then we either build or install your application’s code. Specifically, if you’ve set run_in_place to True, your code will not be installed, but rather run from the directory where it starts out.

The template will try to auto-detect how it should install your Python code. To see its guesses, you can run:

$ ./docker-shared/install-py-dependencies.py diagnose

And then:

$ ./docker-shared/install-py-application.py diagnose

If the output matches your expectations, you probably won’t need to edit any files except perhaps docker-config/settings.py. If the output isn’t doing what you want, you may need to reorganize your files (e.g. make sure you have requirements.txt at top level of repository) or otherwise configure the build.

Files you might need to modify

If the default installation/build logic can’t figure out what to do, it falls back to these scripts:

Extended explanation

If you need to configure pip (e.g. you’re using devpi, or some other custom package source), see the pip configuration docs below.

If you need build secrets, see the secrets docs below.

Installing dependencies: The Dockerfile installs Python dependencies for your application using docker-shared/install-py-dependencies.py.

The script currently supports the following variants:

If you use setup.py to list your dependencies, dependencies will not be installed at this point. This is fine, they’ll be installed later on, but it will make your builds slower, since all dependencies will be re-installed every time your code changes. You can fix this by moving these dependencies into requirements.txt.

See the documentation later in this document about managing Python dependencies on recommendations for how to manage dependencies.

If you’re using git+SSH to download code, see the documentation below for using SSH.

Installing or building the application code: If your configuration (i.e. run_in_place is False in docker-config/settings.py) specifies that the code will be installed, installing using setup.py is currently supported out of the box.

If your configuration specifies the code will run from the checkout directory, it will be built using setup.py build_ext --inplace if there is a setup.py file.

If you need some other method of installing or building your code (e.g. make), edit docker-config/custom-py-install-application.sh.

Try it out

To see if it’s working:

$ ./docker-shared/builder.py build yourapp

If this succeeded, you’ve built an image!

If you’re having problems, consider using the instructions above for running a particular layer for debugging purposes.

Next, to make the image actually run something useful.

5. Customize docker-config/entrypoint.sh

Files you might need to modify

Explanation

When your Docker image is run (via docker run yourimage or whatever your deployment environment is), it will run docker-config/entrypoint.sh. So you might need to edit this file to ensure it runs the correct command.

The default configuration uses gunicorn with reasonable defaults to run a WSGI application configured in docker-config/settings.py, so if you’re using Flask or Django you might not need to change this file at all.

If you’re using the default entry point:

You will want to change the entry point script if:

Change it to run whatever commands you want to run when someone does docker run yourimage.

Try it out:

You should now be able to run your image locally on your own computer—so it’s time to try it out and see if it works. You can do so by running:

$ ./docker-shared/builder.py build yourapp
$ docker run --rm -p 8000:8000 yourapp:latest

Modify 8000 to match the port your server listens on. And remember that on Linux you may need to do sudo docker run.

If it starts, send a query to http://localhost:8000 or the appropriate port and see if it works.

If it blows up, you can run shell using the image (instead of the entrypoint). You can then debug the problem in-place, e.g. try running the entrypoint directly, install new packages with pip (so long as they don’t require a compiler), and in general play around until you’ve figured out the issue.

The way this works is that if you pass command-line arguments to docker run, they get run instead of your entrypoint, so here we start a shell using the image:

$ sudo docker run -it yourapp /bin/bash
[INFO  tini (1)] Spawned child process '/bin/bash' with pid '6'
appuser@25444adb5dc0:~$ which python
/venv/bin/python
appuser@25444adb5dc0:~$

Notice that the virtualenv is enabled by default, you don’t need to change anything.

Usage, part II: CI/CD integration

Now that you have a working image, you probably want to set it up to build automatically whenever someone pushes a change to your version control repository.

1. Push the image

At this point you have the image building locally, and so the next step is to make sure you have the credentials you need to push to an image registry, a server that stores images for you.

  1. Your organization may already have a registry set up, in which case you can use that. Otherwise, if you’re using GitLab you can use GitLab’s built-in registry, or you can sign up for a free trial with Docker Hub, Quay, or GitHub, or set one up in your cloud provider (AWS, GCP, and Azure have them).
  2. Whichever registry you use, it should have instructions for how to login with docker login. Once you have those credentials, run docker login appropriately on your local machine.
  3. You should also figure out what the real name of your image is going to be, rather than the “yourapp” we’ve been using so far.

Let’s assume you’re using Quay, in which case the name will be something like quay.io/yourorg/yourapp. If this name is already in use, test with a different image name, so you don’t break your production images!

Try it out:

Set the BUILD_IMAGE_NAME environment variable to whatever your package name is, and then run build-and-push.py to run the full end-to-end build/test/push process:

$ export BUILD_IMAGE_NAME=quay.io/yourorg/yourapp  # <-- change appropriately
$ ./docker-shared/build-and-push.py

You should now see two images listed in the UI for your registry of choice; in this case quay.io/yourorg/yourapp:latest and quay.io/yourorg/yourapp:compile-stage. You can change latest and compile-stage to some other tag by editing docker-config/settings.py appropriately.

You should also be able to pull the image:

$ docker pull quay.io/yourorg/yourapp

(On Linux you’ll need sudo docker.)

If this worked, the next step is making the above run in your CI/build system.

2. Configure your build system

There are probably two kinds of builds you want to happen automatically:

  1. On every merge to master (or whatever branches/tags you want to build Docker images for), a new image is built and pushed using cached layers for speed.
  2. Weekly or daily, the image is rebuilt from scratch, to ensure the latest security updates.

See this article for a detailed explanation of why you want the latter.

If you’re using GitHub Actions

Copy .github/workflows/docker.yml into your own repository’s .github/workflows/ directory.

By default this uses GitHub Packages Docker Registry. You can change that to any other container registry by editing the fields for the docker/login-action: the username, password, and registry fields.

You will also need to customize the environment variable called BUILD_IMAGE_NAME to be the complete image name you’re building, including the registry, e.g. docker.pkg.github.com/yourorg/yourpackage/image_name.

Note: GitHub also has a new replacement container registry, the GitHub Container Registry, but it’s in beta, so I stuck with the old one for now. You can learn how to use it here: https://docs.github.com/en/free-pro-team@latest/packages/getting-started-with-github-container-registry/migrating-to-github-container-registry-for-docker-images

If you’re using CircleCI

Copy the configuration from .circleci/config.yml into your own configuration.

You will need to set the following environment variables for your build:

  1. DOCKER_REGISTRY: The registry where you will store the images. For example, if you do docker login quay.io on the command line, you would set DOCKER_REGISTRY to quay.io. Set to docker.io to use the default registry (https://hub.docker.com).
  2. REGISTRY_USERNAME: The username for the registry.
  3. REGISTRY_PASSWORD: The password for the registry.
  4. BUILD_IMAGE_NAME: The image you’ll be pushing, e.g. quay.io/yourorg/yourapp.

Note that the image will be rebuilt from scratch (without caching) every Monday morning; you may wish to change the frequency or time of this rebuild. Don’t delete the scheduled rebuild config, though: it’s necessary for security reasons.

If you’re using GitLab CI

Copy the configuration from .gitlab-ci.yml into your own configuration.

By default GitLab’s built-in Docker image registry is used, but if you want to push to another registry you can change the configuration appropriately.

IMPORTANT: The included configuration will reuse cached layers indefinitely, which means you will not get security updates (see this article). To fix that, you will need to manually set up a weekly build:

  1. Manually add a scheduled pipeline that runs this pipeline once a week (or once a day, or whatever interval you want).
  2. Set an environment variable (in the “Variables” section of the new scheduled pipeline) to ensure the build is done without caching: EXTRA_BUILD_ARGS should be set to --no-cache.

For instructions on setting up scheduled pipelines see the documentation.

If you’re using something else

If you have some other build or CI system, you will need to run docker-shared/build-and-push.py with an environment variable BUILD_IMAGE_NAME set to the name of your image.

For example, if your image is quay.io/yourorg/yourapp, you want a build script that looks like this:

#!/bin/bash
set -euo pipefail  # bash strict mode
docker login -u $REGISTRY_USER -p $REGISTRY_PASSWORD quay.io
export BUILD_IMAGE_NAME=quay.io/yourorg/yourapp
docker-shared/build-and-push.py

To rebuild images from scratch without caching (which you should do weekly or daily to get security updates) you can run docker-shared/build-and-push.py --no-cache.

Also, email me at itamar@pythonspeed.com to let me know which build system you use, so I am more likely add it in a future release.

Try it out

The default configuration for Circle CI and GitLab CI only build the Docker image on the master branch. When you’re testing you probably want to test in a branch.

So you should:

  1. Edit your CI config so Docker images are built off your current branch: search for master and change that to whatever your branch name is.
  2. Push your changes, triggering the CI run.
  3. See what failed, fix it.
  4. Repeat until the Docker image builds correctly.
  5. Edit your CI config back to only build Docker images off of master (or whatever makes sense for your version control flow).

3. Consider per-Git-branch tags and labels

Files you might need to modify

Explanation

For all the provided configurations, builds only happen for the master branch. If you want builds to happen for all branches, or for tags, you’ll need to both:

  1. Edit the CI config to enable that.
  2. Edit docker-config/settings.py to enable per-branch tags, otherwise all the different branches’ Docker images will overwrite each other.

In particular, a common way to tag your Docker images is to have the tag based on the Git branch off of which the image is being built. This ensures that images for Git feature branches don’t stomp on images for your production image. See the advanced tagging documentation below for more details.

If this is something you want to do, you’ll need configure additional settings in docker-config/settings.py:

Try it out

To ensure you haven’t broken anything, run:

$ ./docker-shared/builder.py validate_settings

Then you can rebuild the image, and use docker images and docker inspect to respectively inspect the resulting images and look at the labels:

$ ./docker-shared/builder.py build yourapp
$ docker images yourapp
$ docker inspect yourapp

Usage, part III: Robust builds, smaller images

Additional configuration will allow you to make sure you don’t push broken images, and that your images don’t include unnecessary files.

1. Add a smoke test to catch broken images

Files you might need to modify

Explanation

Before the new images get pushed, a minimal test is run to ensure they’re not broken. The example test just sends an HTTP query to the server; you should customize it appropriately.

If you have some other form of testing between when the image is pushed and it gets deployed, or you just want to skip this step, you can just leave smoketest.py alone.

Try it out

Run your smoketest by doing:

$ ./docker-config/smoketest.py yourapp
$ echo $?

The printed out exit code should be 0 if the test passes, non-zero if it fails.

2. Customize .dockerignore so you don’t package unnecessary files

Files you might need to modify

Explanation

.dockerignore lists files that shouldn’t be copied into the Docker image. If you have any large data files in the repository, or secrets, or any other files that shouldn’t be copied into the Docker image, add them here.

The file format is documented here.

Try it out

dive is a tool for figuring out what’s in your image, and where it’s coming from.

docker-show-context is another useful utility that lists which large files made it into the Docker context. You can use it to figure out if there are any large files being copied in.

Next steps

You’re done!

Some next step ideas:

Advanced configuration

Installing dev dependencies

If you’re doing local development, you might want to build the image locally and have it install development dependencies. You can do so by using the --dev option; for example, this will build a Docker image called yourapp with development dependencies installed:

$ ./docker-shared/builder.py build --dev yourapp

The template supports this in three ways:

  1. If you’re using Poetry, it can install Poetry dev dependencies.
  2. If you’re using Pipenv, it can install Pipenv dev dependencies.
  3. If you are usually using requirements.txt to install dependencies, you can provide a dev-requirements.txt.

If you’re using setup.py to install dependencies, I strongly recommend using pip-tools to generate a requirements.txt and add a dev-requirements.txt using pip-tools. This will both enable faster builds, and make your build reproducible.

Configuring build secrets

You often need some secret—a password, a key—in order to build your image. And for security reasons you don’t want that secret to be persisted in the image.

Docker has no good secrets-passing mechanism in its default build backend: if you use build args the secret is still embedded in the image, for example. The new Docker BuildKit backend does include such a mechanism, and is now a stable Docker feature, so if you require secrets BuildKit will be enabled automatically if you haven’t done so already.

This template extends BuildKits secrets mechanism to allow you to set environment variables from secrets. From your perspective, you configure an environment variable in docker-config/settings.py’s build_env_variables dictionary. Since that code runs on the host machine, rather than as part of the build process, it has access to environment variables there.

This means you can pass environment variables configured in your CI/build system into the Docker build process, e.g. you would have the following in docker-config/settings.py:

import os

build_env_variables = {
    "MY_SECRET": os.environ["MY_SECRET"]
}

At that point any Dockerfile RUN command that is run with the WITH_BUILD_ENV command prefix will have an environment variable MY_SECRET. All built-in RUN calls already have WITH_BUILD_ENV added, so typically you won’t have to edit the Dockerfile.

Note that these environment variables will not be available to your entrypoint script: if you need secrets in your container when it runs, pass them in using the normal Docker mechanisms (runtime environment variables).

Also note that if you change these environment variables, it will invalidate most of the cached layers in your build, so you probably don’t want to use these for values that change with every build.

Avoiding secret leaks, #1: If you need to write the secret to disk in order to use it, you are risking it being included in the image. So either:

  1. Write the file to the /dev/shm directory, which is in memory and therefore won’t be included in the final image or its layers.
  2. Delete it before the RUN step ends.

Avoiding secret leaks, #2: Don’t include the environment variable in the RUN command-line, or it will be included in the image and therefore potential attackers might be able to get it.

# This is BAD and NOT SECURE:
RUN WITH_BUILD_ENV yourscript.py $MY_SECRET

# This is GOOD:
RUN WITH_BUILD_ENV yourscript.py   # yourscript.py looks up os.environ["MY_SECRET"]

Configuring pip

If you’re using pip to install your packages, you can configure it via environment variables. Specifically, command-line options get mapped to corresponding environment variables: --default-timeout can be configured using an env variable PIP_DEFAULT_TIMEOUT.

For example, if you are running your own PyPI server or local directory with packages, you will typically do:

pip --index-url http://example.com/your/server -r requirements.txt

So that is the same as having an environment variable PIP_INDEX_URL with value http://example.com/your/server.

To add the PIP_INDEX_URL environment variable, or other variables to configure pip, edit the build_env_variables dictionary in docker-config/settings.py.

Per-git-branch image tagging and other tagging variations

The default tagging model for Docker images produced by this template is that the runtime image (the one you’re likely to care about) has a single tag, latest by default. Since docker-config/settings.py is a Python script, you can come up with fairly complex configuration depending on your needs.

For example, let’s say you want an additional tag, the git hash. You can change docker-config/settings.py like this:

from subprocess import check_output

_git_hash = check_output
    (["git", "rev-parse", "--short", "HEAD"],
    universal_newlines=True
).strip()
run_tags = ["latest", "git-" + _git_hash]

You can also use this mechanism to set per-branch tags. By default the template CI configs assume you only build images off of Git’s master-branch, but what if you want to also build images for feature branches?

You can change docker-config/settings.py like this; notice you’ll need run_tags_for_cache_warming and build_tags_for_cache_warming that point to your default tag:

from subprocess import check_output

_git_branch = check_output(
    ["git", "rev-parse", "--abbrev-ref", "HEAD"],
    universal_newlines=True
).strip()
if _git_branch == "master":
    run_tags = ["latest"]
else:
    run_tags = ["branch-" + _git_branch]

# Tags that will also get pulled to pre-warm your build cache. That way, if
# you're building a new branch for the first time, you can at least get some
# caching benefit from the `latest` image.
run_tags_for_cache_warming = ["latest"]

# Similarly, have different build tags:
build_tags = ["build-stage-" + _git_branch]
build_tags_for_cache_warming = ["build-stage-master"]

You will also need to change your CI config so Docker images are built not just off of master but also other Git branches.

Including build versions in your code

Imagine your server has an API endpoint or log message that wants to include some version information about your code, version information that is only available at build time: a git hash, or branch, or tag, say. You can get this information from running git commands, or often you’ll get from environment variables set by your build or CI system.

First, check if a tool like versioneer might do what you want.

If there no tools that do quite what you want, one solution is to record this information in a file outside of the Docker build, and then it’ll be just another file from the perspective of the code. For example, your CI build might look like this:

# Add file locally with git hash:
echo "git_hash = '$(git rev-parse --short HEAD)'" > \
    yourpackage/_versions.py
# Then continue with Docker build as normal:
build-and-push.py

And you can make the code work when just checked out by having a yourpackage/_versions.py with some default values.

Additional build steps

If you want to do additional build steps, e.g. build static assets for your website or the like, you’ll need to edit the Dockerfile directly. You’ll probably want to do this after the system packages are installed, but depending on your use case this may happen before or after the Python install.

To minimize cache invalidation, try to follow the pattern you already see in the Dockerfile of first copying in just enough of the files to do the next build step. E.g. first copy in your package dependencies list, install those packages, and then in a later step do the actual build.

Additional arguments to docker build

If you want to add additional arguments to docker build (see the CLI docs), you can add them via the additional_docker_build_arguments list in settings.py. For example, if you want to run docker build with --add-host=somehost:10.0.0.1 --add-host=anotherhost:10.0.0.2 you can do:

additional_docker_build_arguments = [
    '--add-host=somehost:10.0.0.1',
    '--add-host=anotherhost:10.0.0.2'
]

BuildKit support

If you want to use BuildKit, you can enable it as usual by doing export DOCKER_BUILDKIT=1. If you configured in secrets/build environment variables this will be done automatically.

Note that I’ve only tested this with Docker 19.03.08 and later; it may or may not work in older versions.

If you want to use BuildKit features like secrets or SSH agent forwarding that require passing additional arguments to docker build, you can do so via the additional build arguments feature.

SSH access

If you want to use SSH in your build, e.g. to retrieve code from a private git repository, you have a number of options:

Option 1. Don’t do it

Instead of checking out code inside the Docker build, check it out outside the build, and then you can just copy it in as normal with no need to run SSH inside the build.

Option 2. Using BuildKit

BuildKit is a new build backend which supports SSH agent forwarding.

You’ll need to:

  1. Enable BuildKit by setting an environment variable: export DOCKER_BUILDKIT=1.
  2. Outside the Docker build, make sure you have ssh-agent running and that you’ve ssh-added the keys.
  3. Add a build argument by editing settings.py, modifying additional_docker_build_arguments list to be ["--ssh", "default"].
  4. Edit the Dockerfile appropriately—see the Docker documentation for BuildKit SSH agent forwarding.

SSH host public keys

You’ll also face the issue of SSH host public keys. If you SSH to unknown hosts, SSH asks you if you want to connect to that host—and you don’t want to have to answer interactive questions during your build.

The host SSH key for GitHub is already included, so if you’re using GitHub you don’t need to do anything.

For other Git servers, you have two options:

  1. Disable host authentication, which in theory makes it easier for attackers to do a man-in-the-middle attack and slip you bad code. You can do this with the StrictHostKeyChecking=no option (see this example).
  2. The more secure option, storing the SSH server’s public keys in advance.

You can do the latter by running the following, substituting the relevant host:

$ ssh-keyscan -t rsa ssh-host.example.com >> docker-config/ssh_known_hosts

Compare the key to the site’s documentation, or at the very least your known_hosts, to make sure it matches.

Additional recommendations

Make sure you rebuild and redeploy the images once a week

Because of the use of caching, system packages won’t get security updates by default. This is why the default CI configuration above makes sure to rebuild the image from scratch, without any caching, once a week. Note that on GitLab CI this requires some manual setup.

Make sure you have set this up, otherwise you will eventually end up with insecure images.

You will then want to deploy these updates images to your production environment.

Python dependencies needs to be updated regularly for security reasons

If you are using pinned dependencies in your requirements.txt (or Pipfile.lock/poetry.lock), you will need an ongoing process to re-pin to new versions in order to get security updates and critical bugfixes.

safety is an open source package you can use to check against one database of Python vulnerabilities.

GitHub has had basic support for vulnerability alerts for a while, and has just acquired Depandabot, which can automatically open PRs against your application with security fixes (and supports Pipenv, Poetry and pip-tools). If you’re on GitHub go to the Security tab of your project and enable Automated Security Fixes (really it’s automated security fix suggestions).

There are also other services like requires.io, and PyUp.

Runtime health checks

You can and should define health checks for a Docker image—a way for whatever system is running the container to check if the application is functioning correctly. The Docker image format itself supports defining health checks, however some systems like Kubernetes ignore these and have their own way of specifying these.

So check the documentation for the systems where you run your images, and add health checks.

Limit Linux capabilities on your container for enhanced security

By default containers get a number of Linux capabilities, which give them (potentially) a subset of the full extra permissions the root user has. If you’re running as a non-root user (as implemented in this template) you don’t have those permissions, but it’s still more secure to grant as few capabilities as possible.

If you’re running with Docker you can drop capabilities at runtime, e.g. using docker run --cap-drop ALL to drop all capabilities. Another possible mechanism is the no new privileges flag which you can set with docker run --security-opt="no-new-privileges:true; unlike dropping capabilities it’s all-or-nothing.

Kubernetes has its own configuration mechanism, and other runtime environments may not give you access to these settings at all.

Dealing with Python dependencies

This template includes a script to pin dependencies for your application. Why is this useful?

Because every application really requires two different sets of dependency descriptions:

  1. The logical, direct dependencies. For example, “this needs at least Flask 1.0 to run”.
  2. The complete set of dependencies, including transitive dependencies, pinned to particular versions. Transitive means dependencies-of-dependencies, and pinning means particular versions. For example, this might be “Flask==1.0.3, itsdangerous==1.1.0, werkzeug==0.15.4, click==7.0, jinja2==2.10.1, markupsafe==1.1.1”.

The first set of dependencies can be used to easily update the second set of dependencies when you want to upgrade (e.g. to get security updates).

The second set of dependencies is what you should use to build the application, in order to get reproducible builds: that is, to ensure each build will have the exact same dependencies installed as the previous build.

Implementing pinned dependencies in requirements.txt

Some alternatives include pipenv and poetry.

Within the framework of existing packaging tools, however, pip-tools is the easiest way to take your logical requirement, and turn them into pinned requirements. You write a requirements.in file (in requirements.txt format) listing your direct dependencies in a flexible way:

flask>=1.0

And then you use pip-tools to convert that to a pinned requirements.txt you can use in your project. setup.py/setup.cfg end up not including any dependencies at all (note that this setup is specific to applications; libraries are a different story).

Using docker-shared/pin-requirements.py

Included in this template is a script based on pip-tools that takes the high-level requirements from requirements.in and transitively pins them to output file requirements.txt:

  1. It uses Docker to ensure it is pinning dependencies for the appropriate version of Python, and for Linux specifically, even if you are using a different operating system and version of Python on your development machine.
  2. It hashes the pinned files. That means if someone replaces foolib 1.1.2 on PyPI with a new tarball with the same name (e.g. via a security breach), pip will complain that the hash has changed and won’t let you proceed.

Just create a requirements.in and then run:

$ docker-shared/pin-requirements.py

You can also specify a different image than your base image (e.g. your compile-stage image) with the --image command-line option.

Custom templates

The goal for this template is to require as little configuration as possible, but you still had to make some changes to get going. However, across your organization there is often a fair degree of uniformity: you probably install packages the same way, and use the same base image.

So once you understand how to use this template, consider making a custom template based on it that means even less work for your coworkers. The goal is to have docker-config/settings.py, and perhaps the entrypoint and smoke test, to be the only files that need to be edited to get a new application configured.

Need help building a custom template? Get in touch and ask about my consulting services.

Docker Compose

Given the way the template works, you can’t use Docker Compose’s build support:

version: '3'
services:
  web:
    build: .   # THIS WON'T WORK

Instead, just use the image name/tag:

version: '3'
services:
  web:
    image: exampleapp

That means you will need to (re)build the image before you do docker-compose up.

Reference

Supported technologies

An overview of the included files

To get a sense of how everything works, here are the files in this template:

├── docker-config         ← Files to customize
│   ├── custom-install-py-application.sh    ⮠ Fallbacks for default
│   ├── custom-install-py-dependencies.sh   ⮢ install scripts
│   ├── entrypoint.sh         ← Gets run when container runs
│   ├── install-system-packages.sh   ← Install Debian/CentOS packages
│   ├── settings.py           ← General settings
│   ├── smoketest.py          ← Test the image isn't broken
│   └── ssh_known_hosts       ← .ssh/known_hosts for build
├── docker-shared        ← Files you hopfully don't need to customize
│   ├── build-and-push.py    ← Drives overall build process
│   ├── builder.py           ← Wrapper for docker CLI
│   ├── install-py-application.py   ← Install/build your code   
│   ├── install-py-dependencies.py  ← Install your code's dependencies
│   ├── pin-requirements.py  ← Utility to pin requirements w/pip-tools
│   └── WITH_BUILD_ENV       ← Setups secrets during build
├── Dockerfile           ← Builds the image
├── .dockerignore        ← Keeps unneeded files out of image
├── .circleci            ← Sample CircleCI configuration
│   └── config.yml
├── .github/workflows/   ← Sample GitHub Actions config
│   └── config.yml
└── .gitlab-ci.yml       ← Sample GitLab CI configuration

Additional files (requirements.in, requirements.txt, setup.py, exampleapp) are an example application to demonstrate/test builds out of the box, and will be replaced with your own code.

The most important files are:

  1. docker-shared/build-and-push.py is what your CI or build system will call: it does the three stage build/test/push process described above.
  2. build-and-push.py calls docker-shared/builder.py to first build the image and then later push the image.
  3. builder.py is configured by docker-config/settings.py.
  4. The Dockerfile and .dockerignore defines how the image will be built.
  5. docker-config/entrypoint.sh is what gets called when you run the resulting image with docker run.

Known issues and limitations

–no-cache mode rebuilds the compile-stage image twice

I suspect the solution is telling the run-stage build step to --cache-from=<yourimage>:compile-stage, but given the security risk of breaking --no-cache mode I need to investigate further before fixing this.

Since –no-cache is expected to be slow (and so only run occasionally), this shouldn’t be a big deal in the interim.

Found a bug? Have a feature request?

If you have any questions or problems please email me.

Changelog

1.2.0

1.1.1

To upgrade from 1.1.0:

1.1.0

To upgrade from 1.0.9:

1.0.9

To upgrade from 1.0.8:

1.0.8

To upgrade from 1.0.7:

1.0.7

To upgrade from 1.0.6:

  1. If you’re using GitLab CI with your own builders, you’ll want to add DOCKER_TLS_CERTDIR: "" to the variables for the Docker build step. See the included .gitlab-ci.yml for an example.
  2. Copy docker-shared/builder.py into your repository.

1.0.6

1.0.5

This release focused on making it easier to get started with the template:

1.0.4

1.0.3

1.0.2

Credits

Thanks to: