When C extensions crash: easier debugging for your Python application

It’s common to use C extensions in Python applications: in order to access pre-existing libraries, or for performance reasons. But unlike Python, the lack of memory safety in C and C++ can lead to crashes—and you’ll need to figure out what caused the crash.

This is extra fun when you get a silent crash half-way through a test run on your CI system:

  • You typically don’t have access to a core file.
  • Lacking good output, you might not even know which test caused the crash.

In production, you’ll also often lack a core file, especially if you’re using Docker where the filesystem is often ephemeral.

In this article I’ll cover some ways you can prepare for crashes in advance, so when they do occur you can quickly figure out which part of the codebase caused them:

  1. The standard library’s faulthandler.
  2. Verbose test runs.
  3. Package listing.
  4. catchsegv on Linux.
  5. Using faulthandler in Docker.

The problem

If you have buggy Python code, you’ll get a traceback when you run it:

$ python
>>> def f():
...     1 / 0
...
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
ZeroDivisionError: integer division or modulo by zero

That’s helpful in figuring which code was responsible for the problem.

But some Python programs crash due to bugs in C code, and then you don’t get a traceback. Let’s create a file called crash.py:

import ctypes
def crash():
    ctypes.string_at(0)
print("About to crash...")
crash()
print("Or not?")

If we run it:

$ python3 crash.py 
Segmentation fault (core dumped)

There’s no traceback. And if you can’t get access to the core file, you’ll have a very hard time figuring out what caused the problem.

1. Tracebacks on segfaults with faulthandler

The Python standard library has a handy module called faulthandler that can print a traceback when a segfault occurs—that is, when a C extension crashes (the documentation has a nice example).

All you need to do set the environment variable PYTHONFAULTHANDLER before running your code:

$ export PYTHONFAULTHANDLER=1
$ python3 crash.py
Fatal Python error: Segmentation fault

Current thread 0x00007f22a69da6c0 (most recent call first):
  File "/usr/lib64/python3.7/ctypes/__init__.py", line 500 in string_at
  File "crash.py", line 3 in crash
  File "crash.py", line 4 in <module>
Segmentation fault (core dumped)

Notice how now we get a traceback, which means it’s much easier to figure out which code caused the problem. The only caveat is that if the problem involved sufficiently bad memory corruption you won’t be able to get any useful output.

If you’re using py.test to run your tests you can alternatively just install the pytest-faulthandler package: it will enable faulthandler automatically when you use py.test to run tests.

2. Enable detailed reporting of which test is running

Many test runners don’t print which tests are being run by default: you just get a list of dots:

$ py.test
test_precalculate.py ......          [100%]

The problem is that if you crash, and the only thing you have access to is that output, you won’t know exactly where the crash happened: you’ll know the test module, but what if your module has 100 tests, or these are integration tests that can call lots of different codepaths?

So on CI at least make sure you run tests with more detailed reporting, so you know which tests exactly ran. E.g. add the -v flag for py.test:

$ py.test -v
test_precalculate.py::test_created_in_threadpool PASSED
test_precalculate.py::test_destroyed_in_threadpool PASSED
test_precalculate.py::test_precreated PASSED
test_precalculate.py::test_new_create_on_get PASSED

If you crash you will then be able to see which test caused the problem—the last one printed, typically.

3. Dump the installed packages at the start of the CI run

If the crash is in a library, sometimes you’ll start getting crashes because of a minor change in the library version. If your local development machine has different package versions you won’t be able to reproduce the problem.

So unless you’re explicitly pinning specific packages builds (with hashes for pip, or via conda env export for Conda), make sure to print out the packages you’ve installed at the start of each CI run.

That is, before you run your tests, run either pip list (or conda env export if you use Conda) to make sure you know exactly which packages were used in the CI run.

4. Use catchsegv on Linux

catchsegv is a Linux utility that prints a bunch of helpful information when your program segfaults. Again, it shouldn’t have much overhead, so just change your code to run like this:

$ catchsegv py.test

(Thanks to Glyph Lefkowitz for the suggestion.)

5. Using faulthandler in Docker

C crashes are painful not just in tests, but in production too. In your Docker images, you can enable faulthandler by adding the following command to the Dockerfile:

ENV PYTHONFAULTHANDLER=1

Failure is inevitable

Sooner or later something will go wrong—and with just a smidgen of bad luck it will happen in a way that makes it very hard to figure out what exactly crashed.

So don’t wait for crashes to occur before adding this debug output—do it today, and your future self (or coworkers) will thank you.




You might also enjoy:

» Stuck with slow tests? Speed up your feedback loop
» When your CI is taking forever on AWS EC2, it might be EBS