Good old-fashioned code optimization never goes out of style

Sometimes, making your Python data processing software faster doesn’t require libraries like NumPy or Pandas, or specialized techniques like vectorization. In fact, if you’re doing string processing, libraries like Pandas won’t help.

Pushing calculation down to a faster implementation is just one way to speed up software. Another way to get faster results is to remove code that is redundant, repetitive, superfluous, needless, or otherwise does unnecessary work. The fastest software, after all, is software that doesn’t run at all.

In short, sometimes all you need is some good old-fashioned speed optimization.

An example

Let’s say you’re reading in some data that has a large number of street names, which can be written in a variety of inconsistent ways. “Brattle St” might also be written as “Brattle Street”, or “Brattle St.”, or “brattle st”. You want to standardize a given name, matching it a list of street names that have the specific formatting you want:

for address in get_addresses():
    address.street = standardize_name(address.street)
    # ... do so something with the address ...

Here’s our first pass at implementing standardize_name():

def standardize_name(street_name):
    standard_street_names = [
        "Brattle St",
        "Mount Auburn St",
        "Massachusetts Ave",
        "Cardinal Medeiros Ave",
        "Hampshire Street",
        "Beacon St",
        "Blake St",
        "Beech St",
        "Garden St",

    # Exact match:
    if street_name in standard_street_names:
        return street_name

    # Different case:
    lower_name = street_name.lower()
    for street in standard_street_names:
        if lower_name == street.lower():
            return street

    # "Ave." and "Avenue" are possible synonyms of "Ave":
    parts = street_name.split()
    if parts[-1].lower() in ("ave.", "avenue"):
        parts[-1] = "Ave"
        fixed_street_name = " ".join(parts)
        return standardize_name(fixed_street_name)

    # "St." and "Street" are possible synonyms of "St":
    if parts[-1].lower() in ("st.", "street"):
        parts[-1] = "St"
        fixed_street_name = " ".join(parts)
        return standardize_name(fixed_street_name)

    raise ValueError(f"Unknown street {street_name}")

This function goes through four phases, stopping as soon as a match happens:

  1. Match the name directly againt a known standardized name.
  2. Compare in a case-neutral way.
  3. Check for variations of “Ave”.
  4. Check for variations of “St”.

Can you see some inefficiencies you might fix?

Measuring our first attempt

If I run some microbenchmarks of different inputs on CPython 3.10, I get the following result:

'Garden St' took 105 nanoseconds
'garden st' took 476 nanoseconds
'Garden street' took 805 nanoseconds
'Brattle St' took 72 nanoseconds

Note: In practice this may well be fast enough. However, as we’ll see, if the number of different streets names we’re standardizing grows, this code will run proportionally slower. In any case, since this is just an example, for purposes of this article we’ll just assume we want it to be faster.

Notice there’s a large spread of elapsed time based on input: the slowest input ("Garden street") is 10× the elapsed time of the fastest input ("Brattle St"). Why the difference?

  • "Garden St" and "Brattle St" are both matched in phase 1 of the four phases listed above. However, to do the match, we’re iterating over the list of street names, and "Brattle St" is earlier in the list than "Garden St". So it seems we’re using an O(S) algorithm in phase 1, where S is the number of street names we know how to standardize.
  • "garden st" is caught by phase 2, which is O(S) again, except we’re doing more work: we’re lowercasing each standardized name.
  • "Garden street" has a different spelling for "St", so it goes through phase 1, phase 2, phase 3, and then phase 4, which recursively starts the algorithm from the start. So this takes almost twice as long as "garden st".

A first attempt

How we can make this faster?

In terms of data structures: we’re running two O(S) loops, one more expensive than the other, because we’re checking for membership in a Python list. If we switch to a dict or set lookup, we can switch to an O(1) algorithm:

def standardize_name_v2(street_name):
    # We switched to a set...
    standard_street_names = {
        "Brattle St",
        "Mount Auburn St",
        "Massachusetts Ave",
        "Cardinal Medeiros Ave",
        "Hampshire Street",
        "Beacon St",
        "Blake St",
        "Beech St",
        "Garden St",

    # ... so this will be faster:
    if street_name in standard_street_names:
        return street_name

For the case-insensitive check in phase 2, we can construct a dictionary mapping lower-cased names to standardized names:

    # Construct mapping from lowercase to standardized:
    lower_to_standardized = {
        s.lower(): s
        for s in standard_street_names
    if street_name.lower() in lower_to_standardized:
       return lower_to_standardized[street_name]

The problem is that constructing lower_to_standardized is O(S) too! So it’ll still get slower as we increase S, i.e. increase the number of streets we know how to standardize. Can we find a different and better approach?

Writing efficient code

Looking at our attempt to making a mapping from lowercase street names to standardized names, we can notice something useful: that dictionary is the same every time we call the function. And we expect to call that function many times with many street names. In fact, if we go back and look at the original code, we’re doing lots of computation in every call to standardize_name() that always gives the same results.

Doing repetitive work that gives the same result on every function call is a waste. Instead, we can precompute as much as possible, in advance, and the function can use that.

From a big picture perspective, we’re going to go from:

for street_name in get_street_names():
    street_name = expensive_standardize_name(street_name)


precalculated_data = precalculate_standardization_data()
for street_name in get_street_names():
    street_name = cheap_standardize_name(
        street_name, precalculated_data

A new implementation

Essentially we want to have a fast mapping from the original street name to the standardized street name. That suggests a dictionary.

In fact, we could create a dictionary mapping from every case combination of every name variation to the standardized name:

    "brattle st": "Brattle St",
    "Brattle st": "Brattle St",
    "bRattle st": "Brattle St",
    ... continue with every capitalization variant ...

But that will start using a lot of memory, especially if we added more streets. Instead, we can create a mapping from the lower-cased street name to its standardized name, and do that for every variant. That means just a handful of entries per street name we know about.

For example, here are the entries for Brattle St:

    "brattle st": "Brattle St",
    "brattle st.": "Brattle St",
    "brattle street": "Brattle St",

Given we have this dictionary, standardizing a name is as easy as doing a dictionary lookup with the lower-cased original street name.

Now, we could write out the whole dictionary by hand. But writing some code to create it will scale better as we add more street names, and reduces our worries about typos.

Based on the above, here’s our new implementation:

# Called just once, at import time:
def _make_street_names_mapping():
    standard_street_names = [
        "Brattle St",
        "Mount Auburn St",
        "Massachusetts Ave",
        "Cardinal Medeiros Ave",
        "Hampshire Street",
        "Beacon St",
        "Blake St",
        "Beech St",
        "Garden St",

    # Create mapping from lower case to standard case,
    # including for street name variants like
    # st/st./street.
    map_to_standard_name = {}
    for name in standard_street_names:
        lower_name = name.lower()
        map_to_standard_name[lower_name] = name

        parts = lower_name.split()
        street_type = parts[-1]
        if street_type == "ave":
            replacements = ["ave.", "avenue"]
        elif street_type == "st":
            replacements = ["st.", "street"]
            replacements = []

        for replacement in replacements:
            parts[-1] = replacement
            map_to_standard_name[" ".join(parts)] = name

    return map_to_standard_name

# Map a lowercase non-standard name to the standardized
# version:
_LOWER_TO_STANDARD_NAME = _make_street_names_mapping()

def standardize_name_v3(street_name):
        return _LOWER_TO_STANDARD_NAME[street_name.lower()]
    except KeyError:
        raise ValueError(f"Unknown street {street_name}")

Note: Whether or not any particular tool or technique will speed things up depends on where the bottlenecks are in your software.

Need to identify the performance and memory bottlenecks in your own Python data processing code? Try the Sciagraph profiler, with support for profiling both in development and production on macOS and Linux, and with built-in Jupyter support.

A performance timeline created by Sciagraph, showing both CPU and I/O as bottlenecks
A memory profile created by Sciagraph, showing a list comprehension is responsible for most memory usage

The performance results

Here’s how fast this new version is:

'Garden St' took 80 nanoseconds
'garden st' took 79 nanoseconds
'Garden street' took 78 nanoseconds
'Brattle St' took 80 nanoseconds

Comparing this to the old version:

Input Old elapsed (ns) New elapsed (ns) Speedup
“Garden St” 105 80 1.3×
“garden st” 476 79 6.0×
“Garden street” 805 78 10.0×
“Brattle St” 72 80 0.9×

For one input, "Brattle St", the new version is actually slightly slower, though this may just be noise. But instead of having a massive spread between the slowest and fastest inputs like we used to, we now get much more consistent results, not surprising given our new function is doing a fixed amount of work. Overall, the new results are consistently on the fast end of the old algorithm, a pretty good result.

Efficiency that never goes out of style

Speeding up software can happen at a variety of different levels. For example, we can use the model used in the book Writing Efficient Programs, by Jon Bentley. Here are the levels he suggests, and how they might apply to our example; I’ve modernized his terminology slightly:

  1. Software Architecture. Not really relevant given the scope of the problem was constrained for educational purposes, but one could imagine doing the standardization in some other place (perhaps inside a SQL query?), or maybe you can ensure your data is in the correct format from the start.
  2. Data structures and algorithms.
  3. Writing Efficient Code: This is the bulk of what the book covers, and he describes it as tweaks that compilers (or in our case, interpreters) aren’t smart enough to do on their own.
  4. Translation to Machine Code: In our case, we could switch to a faster Python interpreter, like PyPy. That’ll help, but the optimizations from steps 2 and 3 will help there too. We could also switch programming languages.
  5. Operating system: I’m running on Linux, and while Linux does improve over time, that’s probably not relevant here.
  6. Hardware: I’m running on a quite-fast computer for the year 2022. If you ran this on an slower CPU, it would indeed run more slowly; at the time of writing, a faster computer would give you a 20-30% speedup, no more. In many situations, hardware only gets you so far.

In our case we sped up the code at two of the levels described above:

  • Data structures and algorithms: We used a dictionary instead of a list to go from O(S) to O(1) operations.
  • Writing Efficient Code: We used some of the optimization rules described in the book to avoid redundantly repetitive calculations, specifically a combination of the “Store precomputed result” and “Code motion out of loops” rules.

This book was published in 1982, back when I was an adorable toddler. The information wasn’t novel back then, it was well-understood knowledge that the book summarized for new programmers. And much of it is still useful today: pick a good data structure or algorithm, don’t do repetitive work, don’t run expensive code inside your inner loop, and more.

There are many ways to make software faster. Some will become less relevant as software and hardware changes, but “avoid unnecessary work” will never grow old.