Finding performance problems: profiling or logging?

by Itamar Turner-Trauring
Last updated 06 Jan 2023, originally created 09 Aug 2022

When your software is too slow in production, how can you figure out the source of the problem?

One common starting point to improving production observability is logging, and ideally trace-based logging (tracing for short). For example, the OpenTelemetry standard and the libraries and backend services that work with it can help you collect metrics, logs, and traces. Tracing—both within and across processes—is the most general of these, immensely useful for identifying and debugging problems, including performance problems.

But there’s another approach to finding performance problems, using sampling-based profiling in production. While profiling is typically used offline during development, you can also use it in production to find performance bottlenecks, with a new generation of continuous profiling tools.

So what’s the difference between trace-based logging and profiling? Which should you use—do you need only one, or both? What’s the fastest way to find performance problems in production?

To jump ahead to our conclusion: trace-based logging and profiling actually give you different information. Once your software is complex enough, you will likely want both.

In this article you’ll learn:

What exactly trace-based logging and profiling mean.
The strengths of weaknesses of each.
Why you probably want both running in production.

A slow program

Imagine you have installed a third-party library, maintained by someone else, that provides a function that takes two lists, and only returns items that are in the first list but not the second:

>>> from thirdpartylibrary import items_only_in_first
>>> items_only_in_first([1, 2, 3, 4], [4, "a", 1])
[2, 3]

We have a little program that uses this third-party library:

from thirdpartylibrary import items_only_in_first
import sys

def task1():
    import time
    time.sleep(1)

def make_data(length):
    return list(range(length))

def task2(input_length):
    result = items_only_in_first(
        make_data(100_000), make_data(input_length)
    )
    return len(result)

def main():
    task1()
    task2(int(sys.argv[1]))

if __name__ == '__main__':
    main()

What is the slowest part of this program, and why?

Using statistical profiling to find performance problems

Statistical profiling takes a sample of your program’s running code at intervals, and tells you what the program is doing at that particular time. Basically, it runs a loop like this:

Sleep for a bit.
Record what the program is doing at this moment in time.
Go to step 1.

Whatever code is running slowly is more likely to be sampled, and therefore more likely to show up in the final trace. Critically, it’s the profiler’s job to record what code is running at any given moment, not yours: you don’t have to do any extra work to get this information.

With a profiler, we can find our example’s performance bottlenecks without changing the code at all. We’ll use the Sciagraph performance profiler to see what the program is doing; it’s designed to run both during development and in production. If you want to run this yourself, you’ll need to sign up for a free account to get the access keys.

The program is called twice, with two different input sizes:

$ pip install sciagraph
$ export SCIAGRAPH_ACCESS_KEY=...
$ export SCIAGRAPH_ACCESS_SECRET=...
$ python -m sciagraph run myprogram.py 1_000
...
The report was stored in sciagraph-result/2022-08-09T18-11-25.198672Z
...
$ python -m sciagraph run myprogram.py 10_000
...
The report was stored in sciagraph-result/2022-08-09T18-11-29.223568Z
...

The resulting profiling reports includes the following timeline graph for the first run:

And this graph for the second run:

Notice that the second run, with an input of 10,000, took much longer to run: 3.8 seconds instead of 1.2 seconds. And in this particular example, we know that the length input made the difference, since we gave different inputs on the command-line.

In real software, the input may be buried deep in an input file, an HTTP response, or the result of some other calculation, any of which won’t show up in the profiling report; profiling is typically focused on functions, not on inputs or variables. To understand the impact of inputs on performance, we need to turn to logging, and ideally tracing.

Using tracing to find performance problems

Logging gives us a different set of information than profiling: in particular, we can log function arguments, results, variables, and other runtime information. And the timestamps of log messages can be used to determine how long some code ran.

Like regular logging, trace-based logging or tracing can also be used to identify which parts of your code are slow. But tracing is much better at getting you this information than normal logging.

Instead of isolated statements of fact—”something happened!”, “something else happened!”—tracing records spans or actions that have a beginning and an end.
Spans build up a tree: they can contain other spans, and can be tracked across threads or processes, even across multiple computers. You can imagine tracing a request from the browser, to the web server, to the backend service, and finally to the database, and then all the way back.

The fact that traces have a beginning and an end is one reason why tracing is better than normal logging: the elapsed time of spans automatically gives you the run time for the relevant code. Let’s look at an example, which I’m going to demonstrate using the Eliot library, mostly because it is simpler to get going than OpenTelemetry, but the principles apply more generally.

To add trace-based logging using Eliot we’ll add:

Some boilerplate to tell it to output logs to out.log.
A log_call decorator that starts a span/action when it is called, logging the inputs, and finishes the span when the function returns. Eliot has other APIs, but this simple API will suffice for our example.

The result:

from eliot import to_file, log_call
from thirdpartylibrary import items_only_in_first
import sys

to_file(open("eliot.log", "a"))

@log_call
def task1():
    # ...

# ...

@log_call
def task2(input_length):
    # ...

@log_call
def main():
    # ...

# ...

Notice that unlike profiling, we have to manually add logging/tracing, and we have to decide which parts of the code to annotate. We know from the profiling above that make_data() is not the bottleneck, so there’s no need to add tracing to it, but lacking that information we have to make a judgment call.

We can run the program and then view the result logs with a program called eliot-tree.

$ python myprogram.py
$ ls eliot.log
eliot.log
$ eliot-tree eliot.log 10_000
9a6512cd-7c5b-46ba-bb44-e8b8d83e3b45
└── __main__.main/1 ⇒ started 15:40:21Z ⧖ 3.062s
    ├── __main__.task1/2/1 ⇒ started 15:40:21Z ⧖ 1.001s
    │   └── __main__.task1/2/2 ⇒ succeeded 15:40:22Z
    │       └── result: None
    ├── __main__.task2/3/1 ⇒ started 15:40:22Z ⧖ 2.060s
    │   ├── input_length: 10000
    │   └── __main__.task2/3/2 ⇒ succeeded 15:40:24Z
    │       └── result: 90000
    └── __main__.main/4 ⇒ succeeded 15:40:24Z
        └── result: None

This is in some ways that same information we got from the profiler—we can see functions calling other functions, and how much time elapsed. But there are some key differences:

We only have information about those functions we explicitly added tracing to. There’s no information about thirdpartylibrary, for example, even though we know from profiling that it’s slow.
On the positive side, we have information about inputs and outputs. Specifically, we can see that task2() was called with input_length: 10000, and returned 90000.

Regardless of where the input length came from—the command-line, some third-party input file, a HTTP response—it will end up recorded in the logs. And we can then correlate between input lengths and run time, ideally using a more sophisticated UI than eliot-tree, and discover that the bigger the input_length, the longer the run time is.

Profiling vs. tracing: a functional comparison

Let’s summarize what we’ve learned so far, and make more explicit the limitations and strengths of each approach.

How much work do you need to do?

Profiling: Just run your program under a profiler, that’s it.
Tracing: You need to instrument every function and code block that you want to end up in the logs.

Can you get information about third-party libraries?

Profiling: Yes, they’re no different than functions in your code; everything relevant gets recorded.
Tracing: The need for manual instrumentation means getting information about third-party can be tricky, especially for internal details and in less dynamic languages.

OpenTelemetry and other tracing libraries tend to come with auto-instrumentation support for common libraries, with a focus on things like web frameworks, databases, and the like. In some domains this may suffice; in others, like scientific computing or data science, it’s unlikely that you will get much improvement in coverage.

How often do functions get recorded?

Profiling: Any function that runs long enough will get recorded, but because of the use of sampling, functions that run quickly and infrequently will not show up in the profiling report.
Tracing: All functions calls can be recorded… so long as you’ve instrumented them. In practice, once you scale up (e.g. large web applications) sampling is also used, because otherwise the data is overwhelming.

Are variables, inputs, and return values recorded?

Profiling: No.
Tracing: Yes, if you chose to record them.

Finding and fixing our performance problem

Back to our example, from tracing we know that input size impacts the speed of task1(). Looking at the profiler graphs above, we can see that the slow code is in the third-party library, and in particular it’s this line:

    return [item for item in first if item not in second]

A little thought will show that this function is quadratic if the second argument is a list; it will run in O(F * S) time, where F and S are the lengths of first and second respectively.

Changing the code to use a set makes the function run much faster when input lengths are big, by running in O(F + S) time:

    second = set(second)
    return [item for item in first if item not in second]

(This example is based on a real, accidental bug in the Mercurial version control system.)

Note: Whether or not any particular tool or technique will speed things up depends on where the bottlenecks are in your software.

Need to identify the performance and memory bottlenecks in your own Python data processing code? Try the Sciagraph profiler, with support for profiling both in development and production on macOS and Linux, and with built-in Jupyter support.

Profiling or tracing? Both!

If you want to find performance problems, profiling in production is extremely helpful. Logging or tracing can only tell you about code that was instrumented; profiling inspects all running code, whether its yours or from a third-party.

Is tracing actually necessary to find performance problems? Seeing the input size in the logs is a useful hint, but in practice, you probably could figure out the performance problem just from the profiling. In other cases, however, performance problems may be harder to identify, and tracing will be helpful. On the other hand, tracing gives you debugging information that is impossible to get from profiling: if you have sufficient logging coverage, you can find logic bugs and other problems.

In short, you want both tracing and profiling in your production environment: they give you different information, sometimes complementary, sometimes useful all on its own.

If you want tracing:

Eliot is useful for small Python applications; I wouldn’t suggest using it for anything large-scale.
OpenTelemetry is supported by a huge ecosystem, as well as many visualization and storage backends, like Honeycomb, LightStep, and the open-source Jaegar.

For profiling in production:

Generic tools like Pyroscope and Prodfiler, which tend to focus more on web applications.
Sciagraph focuses specifically on profiling Python data pipelines and batch jobs; you can also use it during development.

Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler

Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.

Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.