Creating a better flamegraph visualization
How do you visualize performance data so you can easily spot bottlenecks? Brendan Gregg’s flamegraphs are a great solution, adopted by a large number of profilers and performance tools.
However, even great solutions can be improved. With a few small tweaks, you can make flamegraphs much easier to read.
To see what I mean, I’ll start with a default flamegraph, and then make it better step by step. Most of the improvements can be achieved by using the right tool and adding a couple of command-line options, so if you’re generating your own flamegraphs you’ll be able to benefit immediately.
Why you need flamegraphs
A flamegraph is a way to visualize resources used by a program, like CPU usage or memory allocations, and see which parts of your code were responsible. For example, consider the following program:
import numpy as np def make_big_array(): return np.zeros((1024, 1024, 50)) def make_two_arrays(): arr1 = np.zeros((1024, 1024, 10)) arr2 = np.ones((1024, 1024, 10)) return arr1, arr2 def main(): arr1, arr2 = make_two_arrays() another_arr = make_big_array() # → Peak memory usage is here ← main()
Let’s say we want to find which allocations were responsible for peak memory usage. This is a trivial program, so we can figure out the sources manually:
numpywill increase memory usage by some unknown amount.
main() -> make_two_arrays() -> np.zeros()will allocate 80MB (10 million 8-byte floats).
main() -> make_two_arrays() -> np.ones()will allocate 80MB.
main() -> make_big_array() -> np.zeros()will allocate 400MB.
In a real program doing this by hand would be impossible; we need some tool to find peak memory and tell us the relevant stacktraces. The Fil memory profiler does this for Python.
Here’s a random stacktrace it recorded:
example.py:1 (<module>);<frozen importlib._bootstrap>:1007 (_find_and_load);<frozen importlib._bootstrap>:986 (_find_and_load_unlocked);<frozen importlib._bootstrap>:680 (_load_unlocked);<frozen importlib._bootstrap_external>:846 (exec_module);<frozen importlib._bootstrap_external>:978 (get_code);<frozen importlib._bootstrap_external>:647 (_compile_bytecode) 10183
Reading this sort of thing is difficult, and profiles of real programs can have thousands of recorded data points; what you want is a visualization of some sort.
Step 1: The original flamegraph
Brendan Gregg came up with flamegraphs as a way to summarize stacktraces like the above.
We can use his
flamegraph.pl utility to get a visualization of the data recorded by Fil.
As a Fil user you wouldn’t actually have to do this, since it generates nice flamegraphs out of the box, but for many profiling tools you would need to generate the flamegraph yourself.
$ cat peak-memory.prof | perl flamegraph.pl > 01-default.svg
The rendering is a bit off when it’s embedded, so make it full screen to get a better sense of what it looks like:
The basic idea is that:
- Stacktraces are combined into stacked frames.
- The stacked frames’ width indicates how much of the resource is being used.
In this case, that means memory.
So the frame for
make_big_array()is proportionally wider than the other frames because it is the bulk of the memory allocation (if you don’t see it, scroll right).
Tip: You can click on a frame to zoom in to that part of the diagram.
Step 2: Icicle graphs
The first problem with this visualization is that all the interesting data is a the bottom, so if you have to scroll down to see it. In addition, these are Python stacktraces, which are traditionally written with the most specific function call at the bottom, whereas this is using the opposite order.
To solve this, we can use the icicle mode provided by
flamegraph.pl, which flips the visualization’s orientation.
$ cat peak-memory.prof | perl flamegraph.pl -i > 02-icicles.svg
Here’s what the output looks like:
Step 3: Text alignment
If you look at the leftmost stacktrace in the graph above, it’s quite difficult to see which function is being called, or even which file is being referred to.
In particular, the frames’ text looks like
path/to/myfile.py (functionname), but because the text is left-aligned, the more important information, file name and function name, is truncated if the frame is small.
Instead, when text doesn’t fit we want to right-align it, so that the less-interesting leftmost part gets truncated.
At this point we’re going to switch to Inferno, a reimplementation of
flamegraph.pl in Rust.
Thanks to a patch I contributed, if text doesn’t fit in a frame, Inferno right-aligns it by default.
$ cat peak-memory.prof | inferno-flamegraph -i > 03-right-align.svg
The result now makes reading smaller frames much easier; you can see the function being called. In this example the benefit is more visible if you full-screen the SVG:
Step 4: Better colors
As we discussed above, when looking at a flamegraph you want to find the widest frames: these are the frames that are using the most resources, in this case memory. But in the image above, your eye is naturally drawn towards the stack on the left, because it has more red and it also stands out more. And the stack on the left is not the bottleneck, it’s not the stacktrace you should be looking at.
So let’s switch to a different color scheme. Instead of randomly assigning colors, we can use a mode I contributed to Inferno that makes wider frames more saturated and red.
$ cat peak-memory.prof | inferno-flamegraph -i --colordiffusion > 03-better-colors.svg
Now your eyes are naturally drawn towards the wider frames:
An unimplemented idea: Flamegraphs sometimes use different colors for different categories of frames. The above technique could still be used, by varying saturation for the respective category’s color instead of just using red.
Step 5: Source code
If you look at the current output, it looks quite similar to a Python traceback. The only thing missing is the source code Python includes by default.
So why not include source code in the flamegraph?
This won’t work in all flamegraph use cases; for compiled code this can be tricky. But for Python it’s pretty easy. This is what Fil does by default, as well as the Sciagraph performance and memory profiler.
Here’s what Fil’s output looks like:
Using these improvements yourself
As you saw above, you can get most of these improvements by using Inferno to generate your flamegraphs and using the correct command-line options. Including source code in the output is more involved; it requires a bunch of hacks given the current input format for these tools, which uses spaces as separators.
Inferno is also a Rust library, so if you’re writing Rust code you can use the appropriate arguments:
use inferno::flamegraph; let mut options = flamegraph::Options::default(); options.color_diffusion = true; options.direction = flamegraph::Direction::Inverted;
Consulting services: take your code from prototype to production
You have a working Python prototype for your data processing algorithm. Now you need to get it ready for production. Which means your software needs to be fast, robust, maintainable, cost-efficient, and scalable.
With more than 25 years experience of shipping software to production, I can help you:
- Speed up your code so it can get results on time, and run at scale with an affordable operating budget.
- Learn about tools, techniques, and process improvements that will help you ship best-practices software, on schedule.
To get in touch about consulting services, send me an email at firstname.lastname@example.org.
Speed up your Python code with skills you can use at your job
Sign up for my newsletter, and join over 7400 Python developers and data scientists learning practical tools and techniques, from Python performance to Docker packaging, with a free new article in your inbox every week.