When C extensions crash: easier debugging for your Python application
It’s common to use C extensions in Python applications: in order to access pre-existing libraries, or for performance reasons. But unlike Python, the lack of memory safety in C and C++ can lead to crashes—and you’ll need to figure out what caused the crash.
This is extra fun when you get a silent crash half-way through a test run on your CI system:
- You typically don’t have access to a core file.
- Lacking good output, you might not even know which test caused the crash.
In production, you’ll also often lack a core file, especially if you’re using Docker where the filesystem is often ephemeral.
In this article I’ll cover some ways you can prepare for crashes in advance, so when they do occur you can quickly figure out which part of the codebase caused them:
- The standard library’s
- Verbose test runs.
- Package listing.
If you have buggy Python code, you’ll get a traceback when you run it:
$ python >>> def f(): ... 1 / 0 ... >>> f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in f ZeroDivisionError: integer division or modulo by zero
That’s helpful in figuring which code was responsible for the problem.
But some Python programs crash due to bugs in C code, and then you don’t get a traceback.
Let’s create a file called
import ctypes def crash(): ctypes.string_at(0) print("About to crash...") crash() print("Or not?")
If we run it:
$ python3 crash.py Segmentation fault (core dumped)
There’s no traceback. And if you can’t get access to the core file, you’ll have a very hard time figuring out what caused the problem.
1. Tracebacks on segfaults with
All you need to do set the environment variable
PYTHONFAULTHANDLER before running your code:
$ export PYTHONFAULTHANDLER=1 $ python3 crash.py Fatal Python error: Segmentation fault Current thread 0x00007f22a69da6c0 (most recent call first): File "/usr/lib64/python3.7/ctypes/__init__.py", line 500 in string_at File "crash.py", line 3 in crash File "crash.py", line 4 in <module> Segmentation fault (core dumped)
Notice how now we get a traceback, which means it’s much easier to figure out which code caused the problem. The only caveat is that if the problem involved sufficiently bad memory corruption you won’t be able to get any useful output.
If you’re using
py.test to run your tests you can alternatively just install the
pytest-faulthandler package: it will enable
faulthandler automatically when you use
py.test to run tests.
2. Enable detailed reporting of which test is running
Many test runners don’t print which tests are being run by default: you just get a list of dots:
$ py.test test_precalculate.py ...... [100%]
The problem is that if you crash, and the only thing you have access to is that output, you won’t know exactly where the crash happened: you’ll know the test module, but what if your module has 100 tests, or these are integration tests that can call lots of different codepaths?
So on CI at least make sure you run tests with more detailed reporting, so you know which tests exactly ran. E.g. add the
-v flag for
$ py.test -v test_precalculate.py::test_created_in_threadpool PASSED test_precalculate.py::test_destroyed_in_threadpool PASSED test_precalculate.py::test_precreated PASSED test_precalculate.py::test_new_create_on_get PASSED
If you crash you will then be able to see which test caused the problem—the last one printed, typically.
3. Dump the installed packages at the start of the CI run
If the crash is in a library, sometimes you’ll start getting crashes because of a minor change in the library version. If your local development machine has different package versions you won’t be able to reproduce the problem.
So unless you’re explicitly pinning specific packages builds (with hashes for
pip, or via
conda env export for Conda), make sure to print out the packages you’ve installed at the start of each CI run.
That is, before you run your tests, run either
pip list (or
conda env export if you use Conda) to make sure you know exactly which packages were used in the CI run.
catchsegv on Linux
catchsegv is a Linux utility that prints a bunch of helpful information when your program segfaults.
Again, it shouldn’t have much overhead, so just change your code to run like this:
$ catchsegv py.test
(Thanks to Glyph Lefkowitz for the suggestion.)
faulthandler in Docker
C crashes are painful not just in tests, but in production too.
In your Docker images, you can enable
faulthandler by adding the following command to the
Failure is inevitable
Sooner or later something will go wrong—and with just a smidgen of bad luck it will happen in a way that makes it very hard to figure out what exactly crashed.
So don’t wait for crashes to occur before adding this debug output—do it today, and your future self (or coworkers) will thank you.
The concise and action-oriented guide to Docker packaging for production
Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.
Take the fast path to learning best practices, by using the Python on Docker Production Handbook.
Learn practical Python software engineering skills you can use at your job
Sign up for my newsletter, and join over 6500 Python developers and data scientists learning practical tools and techniques, from Python performance to Docker packaging, with a free new article in your inbox every week.