Do you have time to waste fiddling around with Docker packaging?
There’s always more work to do, more features to ship, more bugs to fix. Docker packaging may be useful and often necessary, but the time you spend on it isn’t going to make your customers any happier.
If you’re shipping your Python application into production using Docker, your images are going to be critical infrastructure. So you want to follow best practices, to reduce the risk of production problems or security breaches.
But implementing all those best practices takes time, time your team could put to better use. And even once you’ve spent the time to create a decent initial Docker setup, that’s not the end of the time you’re going to waste.
“I’m thinking of disabling multi-stage builds, because the Docker build is so slow…”
I heard the above from someone I talked to recently, but I had the exact same experience at a previous job: I’d done the right thing, using current Docker best practices for small images, and the builds were slow.
It was a complex application, to be fair, but my teammates and I were spending 15-20 waiting for Docker builds. And that meant either context switching, not an efficient use of time, or twiddling our thumbs waiting.
So eventually I got so tired of wasting time I dug in and made it faster. But because the builds were so slow, it took a day or two of debugging for me to figure it.
And now I know how to fix that problem, and I’ve written it up on my site—but do you really want to spend time waiting for slow Docker builds, then waste time debugging them? Does your team have that kind of time to waste?
Save time—a lot of time
You want fast feedback loops so your team can learn faster, and fast setup so you can spin up new applications easily.
And you also want to ship your images with confidence: confidence that your team is productive, confidence that you’re following operational best practices, confidence that you’ve done your homework and you’ve built it right.
To help you get there I started by taking my personal experience with Docker (and corresponding mental scars), added many weeks of additional research and development, as well as feedback and review by other experts, and created a best-practices template for Docker packaging of your Python application.
Watch me Dockerize a Python application in just 3 minutes
How much time can you save Dockerizing your applications with this template?
The first time you go through the onboarding tutorial (there’s a video showing it later on in this page), I would expect a simple WSGI application to take as little as 30 minutes, and I’ve seen a new user get a complex application going in just 2-3 hours.
Of course, if you have multiple images to package, the later images will go even faster. For an admittedly simple case, a WSGI app with no special system packages, I was able to package a Python application in just 3 minutes:
And at the end of the video you’ll also notice that later builds take full advantage of Docker’s caching, running much faster. Faster builds give your team a faster feedback loop.
Ready to get started?
What best practices does the template implement?
- Specifically designed for Python applications, with automatic support for most Python packaging setups (
setup.py, pipenv, poetry, and flit).
- Installs dependencies separately from application code when possible, for faster builds.
- Pre-configured with fast multi-stage builds, to help you build smaller images.
- Secure by default, running your code as a non-root user, and shipping the minimal files needed.
- Implements and documents the processes you need to keep dependencies up-to-date.
- Pre-configured for CircleCI and GitLab CI, so you can get started even faster.
- Encourages and documents best practices like smoke tests and health checks.
- Heavily documented and annotated with references so you can learn more about why and how it’s implemented.
How do you use the template?
The template comes with files you check in to your repository: a
Dockerfile, build scripts, and configuration files.
In order to make things as simple and easy as possible, and get you started quickly, the template comes with highly detailed step-by-step tutorial:
Once you’ve gone through the tutorial, you can access the template’s full power (which also comes with detailed documentation). For the most part you’ll be editing one particular configuration file, implemented in Python to give you maximal flexibility:
Why a template?
The problem with packaging is that it’s a messy, nitpicky process that’s all about the details.
Tools try to build abstractions, and abstractions don’t deal well with this many details. Those details are sufficiently different between different organizations and projects that either:
- The tool can’t support a whole bunch of totally reasonable use cases.
- Or, you end up with a tool that is so general purpose it’s hard to use right.
A tool like
Dockerfiles, for example.
A template, in contrast, can be built to do the right thing for 90% of situations—and then in the remaining 10% situations you can still make it work. Because all the code in a template is designed to be customized, you can still handle the remaining edge cases yourself.
A specific example: Python packaging tools
Since we’re talking about Python, you have packaging options! There’s
pipenv, and the PEP 517 options of
What’s more, there’s actually two ways to deal with your application code:
- Install it using
- Run it in place, and perhaps building it with
python setup.py build_ext --inplace.
The template I’ve built handles the resulting combinatoric options (and they’re tested by an automated test suite). And if it can’t figure out what to do, there are fallback scripts designed to be configuration points.
However, if this was a tool and your packaging setup was different enough, you’d be stuck. Since this is a template, you can customize all of it for your particular needs if the defaults don’t work.
Ready to build production-ready containers?
Is learning Docker packaging best practices really that hard?
Unfortunately, while the information you need is out there on the Internet, actually discovering and implementing best practices is slow, tricky work.
You’ll need to filter out the ignorant, broken, and confused advice from people who don’t really know what they’re doing.
Do a quick search on Google or DuckDuckGo, and you’ll find
- Result in super-slow rebuilds, since the author didn’t understand the Docker image caching model.
- Recommend using Alpine as a base image. Alpine uses the alternative musl libc, which among other problems has a small default thread stack size that can cause Python to crash—see issue 32307 in the Python bug tracker.
- Have you run your container as root, even though that’s insecure.
And that’s just a subset of the bad advice you’ll encounter.
You’ll also need to filter out all the obsolete and out-of-date advice. Docker has gone through many changes, and 3-year-old best practices may no longer be relevant. For example, the multi-stage build support added in Docker 17.05 obsoletes much of the previous advice on creating small images.
You’ll need to find the various places—obscure blog posts, half-baked documentation, slides from conference talks—where actual best practices are covered. There is no one place that covers everything you need to know. You can and should read the Docker best practices documentation, but just to give a few examples of what it’s lacking:
- It won’t tell you how you can easily copy over all your installed Python files in a multi-stage build (a virtualenv or
pip install --user).
- It won’t tell you the different problems you’ll encounter running a smoke test on Circle CI or GitLab CI (it won’t even tell you to write a smoke test).
- And if you’re using gunicorn, you’ll have to look elsewhere to figure out how to avoid performance issues in Docker with gunicorn’s default configuration.
You’ll need to figure out how Dockerfiles, the Docker image format, the Docker build process, Python packaging, and your build/CI system interact. Even if you have a perfectly fine Dockerfile, if you don’t understand how your CI system affects Docker caching you’ll still end up with super-slow builds wasting your team’s time.
If you go elsewhere on this site you’ll find many articles I wrote to help you get started—but there’s still all the work of integrating and debugging the result. So why not just take advantage of my work and get going immediately?
Ready to build production-ready containers?
Is this template for you?
Designed for production, not development environments
Some developers use Docker for development environments, or to run data science batch jobs locally, or other local use cases. This template is not designed for those cases; rather, it’s designed for running in on production servers.
In order to use this template successfully, developers are expected to understand:
- Basic Docker usage, including
docker run/build/ps/...commands and the
- Python packaging, e.g. specifying dependencies in a
- Linux command-line usage.
- The details of your CI system.
Won’t magically solve all your operational problems
This template is far better than what most developers can do on their own without a significant and costly time investment.
However, Docker packaging is just one part of your operational and deployment infrastructure, and it can’t fix problems in your code. So even if you use this template you might still have security breaches, crashes, or other problems.
Some industries I prefer not to do business with
If your team will be using the template for software for military use, prisons, oil & gas extraction, surveillance, national security, or the like:
- Please don’t buy this template.
- I encourage you to find a different job, and then buy the template for your new job if it’s relevant there.
Not for hobbyists
If you’re just working on a personal project, I wouldn’t suggest paying for this template. This template is for organizations running software in production, where downtime and slowness cost real money.
Want to build production-ready containers?
You can use the template for an unlimited number of images, forever. What you can’t do is redistribute the template itself.
- You won’t be able to distribute images based on the template outside your organization, if they include the template’s code.
- You won’t be able to distribute the template as part of the source code of your project, if you send source code to customers.
- You won’t be able use it to package an open source project.
If you have questions, or would like a custom licensing agreement to address these limitations, get in touch at firstname.lastname@example.org.
You can read the specific license terms here.
Detailed feature list and supported tools
Easy to use
- Designed to work out of the box for many common situations with just changes to the config file.
- Powerful and customizable enough to support more complex use cases.
- Re-usable so you can apply best practices to all your Python applications.
- Includes diagnostic and debugging tools, and explanations on how to test and debug your image when you’re first applying the template.
Build images fast
- Step-by-step builds allow caching, so changes to your code don’t require re-installing all dependencies.
- Enables caching even though it uses multi-stage builds.
Good operational setup
- Correct signal handling to ensure fast shutdown of your application.
- Ensures your application shows Python tracebacks on segfaults, for easier debugging.
- Pre-configured gunicorn for WSGI applications.
- Documents surrounding best practices (package pinning, health checks).
- Multi-stage builds are used to make smaller images.
- Runs as non-root user.
- Supports passing in secrets in a way that doesn’t leak them into the image.
- Documents necessary processes for keeping dependencies up-to-date.
- Mechanism for including SSH host public keys (GitHub is pre-populated).
- Python 3.6 and 3.7.
- Docker CE Stable (18.09).
- Package installation using
- Debian and Ubuntu distributions supported out of the box.
- Gitlab CI and CircleCI configurations included out of the box, as well as generic instructions for other CI/build systems.
Additional base images (e.g. CentOS) and CI systems can easily be added, and might be included in future updates.
Conda is not supported at this time.
Ready to build production-ready containers?
Hi, I’m Itamar Turner-Trauring.
I’ve been writing Python since 1999, and I first started using Docker in 2014, when I was part of a team that wrote one of the first distributed storage backends for Docker.
I’ve since built Telepresence, a remote development tool for Kubernetes that was adopted as a Cloud Native Computing Foundation sandbox project. I’ve also deployed a number of production Python applications as Docker images.
The patterns embedded in the Production-Ready Python Containers are based on this experience, as well as extensive additional research; this template is far better than any of my previous hand-built images.