Debugging out-of-memory crashes in Python
You run your program, and it crashes—it’s out of memory:
- If you’re lucky, you get a
- If you’re less lucky, you get a coredump.
- If you’re having a bad day, your computer locks up and you need to restart it.
How do you figure out what is using up all your Python program’s memory?
One way to do that is with the Fil memory profiler, which specifically—albeit experimentally—supports debugging out-of-memory crashes. But before we see how that works, let’s consider why out-of-memory situations are so painful.
The unpleasant experience of running out of memory
If your program is maxing out your computer’s CPU, your program will just run slower—it will eventually finish. Run out of memory, however, and you’re not going to be so lucky. When you’re out of memory, the program is first going to become extremely slow, and then it’s going exit—often badly.
Your operating system will often be configured to write out unused parts of memory to disk—"swapping" to disk—so other programs can get that memory. This is useful when you’re running multiple applications, like a browser and a text editor. If you’re not currently using the browser, the OS can write out its memory to disk, and only load it when you switch back.
Eventually even swapping isn’t enough, and there is no memory left whatsoever: memory allocation fails.
- If it reached Python, Python will try to throw a
MemoryErrorexception. However, throwing an exception, printing it out, handling that error: all of these may require allocating memory too! So this may well not work, leading to an even less informative crash.
- If it’s C code that doesn’t handle failed allocations very well, you might end up deferencing an invalid address and segfaulting.
- In some cases the operating system, e.g. the Linux OOM killer, will kill the process.
- Whatever the lead up to a crash, your operating system might end up trying to write out a core dump file of the crashed process. Writing out a core dump also takes memory. At this point swapping might get so bad that your computer is essentially dead, and you’ll need to manually reboot it.
In the worst-case scenario, figuring out why your process used so much memory is extremely difficult.
But even if you got a nice
MemoryError traceback, you’ll know what straw broke the camel’s back—but you won’t know where the rest of the stuff on the camel’s back came from.
When your memory runs out, it’s quite difficult to figure out where your program allocated all the memory up to that point. And that memory is the likely culprit.
Using Fil to debug out-of-memory crashes
To help you debug these situations, the Fil memory profiler includes (experimental) support for dumping out current memory allocations at the time of a crash. Let’s see how this works.
Consider the following Python program:
import numpy as np ALLOCATIONS =  def add1(x): ALLOCATIONS.append(np.ones((1024 * 1024 * x))) def add2(): add1(5) add1(2) def main(): while True: add2() add1(3) x = np.ones((1024 * 1024,)) main()
When I run this program the process is killed, likely by the Linux out-of-memory killer. No traceback is printed.
$ python oom.py Killed
Now, in this case the program is simple enough that you can figure out the memory leak from reading it, but real programs won’t be so easy. So what you want is a tool to help you debug the situation, a tool like the Fil memory profiler.
Let’s see how you use Fil to debug this.
First, we install Fil:
$ pip install filprofiler ...
Next, we want to make memory allocation fail a little bit earlier, before the process is terminated by the Linux OOM killer.
We can use the
ulimit tool to limit how much memory can be allocated to the process.
We can run
free to figure out how much memory is available—in this case about 6.3GB—and then set a corresponding limit on virtual memory:
$ free -h total used free shared buff/cache available Mem: 7.7Gi 1.1Gi 6.3Gi 50Mi 334Mi 6.3Gi Swap: 3.9Gi 3.0Gi 871Mi $ ulimit -Sv 6300000
If I run the program directly I now get a
MemoryError, but that’s still not enough to know where all the memory allocations came from:
$ python oom.py Traceback (most recent call last): File "oom.py", line 18, in <module> main() File "oom.py", line 14, in main add2() File "oom.py", line 10, in add2 add1(2) File "oom.py", line 6, in add1 ALLOCATIONS.append(np.ones((1024 * 1024 * x))) File "/home/itamarst/Devel/sandbox/oom/venv/lib64/python3.7/site-packages/numpy/core/numeric .py", line 207, in ones a = empty(shape, dtype, order) MemoryError: Unable to allocate 16.0 MiB for an array with shape (2097152,) and data type float64
So now I run the program under Fil:
$ fil-profile run oom.py ... =fil-profile= Wrote memory usage flamegraph to fil-result/2020-06-15T12:37:13.033/out-of-memory.svg =fil-profile= Wrote memory usage flamegraph to fil-result/2020-06-15T12:37:13.033/out-of-memory-reversed.svg
out-of-memory.svg looks like:
As you can see, this shows exactly where all the memory came from at the time the process ran out of memory. Which means you now have a starting point for reducing that memory usage.
Memory use too high? Try Fil
Fil can help you figure out where your crashing program is allocating its memory. But it can also help you with non-crashing programs, by measuring peak usage of your data processing program.
Once you’ve measured memory use and know where it’s coming from, you can start applying a variety of techniques to reduce memory usage.
Learn even more techniques for reducing memory usage—read the rest of the Small Big Data guide for Python.
Wasting compute money on processes that use too much memory?
Your Python batch process is using too much memory, and you have no idea which part of your code is responsible.
You need a tool that will tell you exactly where to focus your optimization efforts, a tool designed for data scientists and scientists. Learn how the Fil memory profiler can help you.
Level up your job skills and become a better data scientist
You're not a software engineer, but you still have to deal with everything from bugs to slow code to mysterious errors. Writing software that's maintainable, fast, and easy-to-understand would make you a better data scientists (not to mention more employable).
Subscribe to my newsletter, and every week you’ll get new articles showing you how to improve you software engineering skills, from testing to packaging to performance: