Speed up production Python data-processing jobs with always-on profiling

Why Sciagraph? Or, that time I spent 70% of revenue on cloud computing

You’re running a data processing batch job written in Python, and it’s far too slow; your feedback loop when bug fixing takes days, and getting results from real data isn’t much better. And then there’s your computing costs, which are way too high. Now you have questions:

  • What are the performance bottlenecks you need to fix?
  • Why is your code using so much memory?

Here’s a real example: I once worked on a scientific image-processing pipeline that would take 8 hours to finish. This was too slow—and much too expensive. A quick back-of-the-envelope calculation showed we were going to spend 70% of the company’s projected revenue just on cloud computing.

Needless to say, I spent the next month optimizing the code! But it wasn’t easy—I needed to download data from production, use tools that didn’t give me the right information, and my computer was quite different from the production environment. And since the code was slow to begin with, checking results with real data took a long time too!

Profiling production jobs—in production

Is there an easier way to figure out why your production jobs are slow?

Here’s one idea: let’s pretend you have access to a time machine. Whenever you realize a batch job is slow, you can go back in time, enable some profilers, and then travel back to the present. Then, when the batch job finishes, you will have an exact record of performance and memory usage, exactly as it ran in production.

Only problem is, time machines don’t exist.

But what you can do is run all your production jobs with profiling enabled from the start, by default. The profiler would need to be fast, so it didn’t slow down your code, and robust enough to run in production.

Sciagraph™: an always-on, continuous profiler for data science and scientific computing

This is where the Sciagraph profiler can help.

  • Speed up your code: identifies where time is being spent, and whether time is spent running or waiting.
  • Reduce memory usage: reports source of allocations for peak memory usage, so you can optimize the relevant bottlenecks.
  • Low overhead: for many data processing jobs the overhead should be negligible.
  • Robust: designed from the very start to be safe and reliable when running in the real world.
  • Designed for Python batch jobs: designed for long-running data processing Python batch jobs, like data pipelines or scientific computing.

Want to speed up your code? Try out Sciagraph today


Gain performance insight into your code, as it runs production

Let’s take a look at some examples of what Sciagraph can tell you.

Here we can see two Python threads fighting over CPython’s Global Interpreter Lock, which prevents more than one Python thread from running at a time; wider and redder frames mean more time taken. Mouseover a frame to get the full info for the frame.

You’ll have an easier time viewing this on a computer, with the window maximized; the output is not designed for phones!



And here we can see where peak memory usage is coming from in a different program; again, wider and redder means more memory usage. You can click on a frame to get a traceback.



</hr>

Want to speed up your code? Try out Sciagraph today