Transgressive Programming: the magic of breaking abstractions

In programming as in social life, there are boundaries we try not violate: we build software with abstractions, boundaries between the complexity beneath and the utility we want to achieve. You can transgress these boundaries; you can bypass abstractions, or rely on implementation details. The accepted wisdom, however, is that good programming sticks to abstraction boundaries.

But programming requires varying mindsets for varying situations and goals, and good habits in one situation might not suffice in another. Performance is a case in point: if you want to write fast code, sometimes you have to write code that violates abstraction boundaries.

For problems like performance and debugging, the usual good habits of programming—sticking to abstraction boundaries, not making assumptions about implementation—are often not sufficient. In these situations, you might need to deliberately violate abstraction boundaries, deliberately writing what feels like bad code.

For lack of a better term, I’m going to call this transgressive programming. The term was inspired by Siderea’s classic essay The Asshole Filter, which points out that being an asshole is about being transgressive, about violating social boundaries and rules.

I deliberately chose a term with negative connotations because there’s a reason this isn’t the default: breaking boundaries is not a thing to do lightly. But sometimes it is necessary. Some of what is called “systems programming” is trangressive programming, but so is writing a browser polyfill, or testing tricky code by monkeypatching.

Let’s see why “good programming” involves sticking to abstraction boundaries, and some of the situations where that’s not sufficient.

Why we stick to abstractions

In order to see why we usually try hard to stick to abstraction boundaries, let’s consider two kinds of transgression.

Transgression #1: Relying on implementation details

Let’s compare two numbers in Python:

>>> x = 12
>>> y = 12
>>> x == y
True

As expected.

There’s another way we can compare these numbers: using the is operator, which compares identity, i.e. whether two objects are pointing to same location in memory.

>>> x is y
True

Does that mean we can use is instead of ==? Let’s try another number.

>>> x = 123456
>>> y = 123456
>>> x == y
True
>>> x is y
False

As expected x == y, but x is y worked differently than it did for a small number.

Here’s why: whenever you create a number in Python, it allocates a whole new object. This is slow and has memory overhead, so the default implementation of Python, known as CPython, has an optimization: small integers are cached and reused. Every 12 in Python is the same object as every other 12, but that’s not true for large integers.

This is why you shouldn’t use is for logical equality: == compares whether the two objects have the same value, even if they’re in different places in memory. In contrast, is checks whether two objects point to the same address in memory, which for integers at least can vary based on their value.

Using is for equality of arbitrary objects violates an abstraction boundary by relying on implementation details: stable memory addresses may or may not be an implementation detail.

Transgression #2: Crossing boundaries

Another way we could compare two objects is by calling the Python C API that is the implementation underlying the == operator. Here’s my first attempt:

>>> x = 123
>>> y = 123
>>> z = 456
>>> import ctypes
>>> pyapi = ctypes.PyDLL(None)  # load symbols from the executable
>>> Py_EQ = 2  # copy/pasted from CPython headers
>>> pyapi.PyObject_RichCompareBool(id(x), id(y), Py_EQ)
1
>>> pyapi.PyObject_RichCompareBool(id(x), id(z), Py_EQ)
Segmentation fault (core dumped)

I could spend a bit more time figuring out how to use this API the correct way, but the failure mode is educational too: using C APIs is inherently more dangerous than Python, and pure Python x == y isn’t going to cause segfault the vast majority of the time.

Moreover, I am again relying on implementation-specific details; in other implementations of Python (PyPy, Jython) this wouldn’t work.

When transgression is necessary

As we’ve seen, abstraction boundaries are there for a reason, and in general we try not to transgress them:

  1. We avoid relying on implementation details.
  2. We try not to cross abstraction boundaries unnecessarily.

For programming that involves writing business logic, these are fine rules. But not all programming is about business logic.

Performance

In order to write fast software, you eventually need to understand not just the guarantees provided by an abstraction, but how it actually works.

Some examples:

Debugging and introspection

When things go wrong, sometimes understanding implementation details, or crossing abstraction boundaries, is the only way to figure out the problem. Some examples:

Testing

Setting arbitrary attributes on arbitrary Python objects—modules, classes, functions, even builtins—is possible, but not something you usually want to do in real code. Here, for example, I am overriding the builtin int() object:

>>> class myint(int):
...     def __repr__(self):
...         return "I am definitely not an int"
... 
>>> __builtins__.int = myint
>>> int(123) + 1 == 124
True
>>> int(123)
I am definitely not an int
>>> 

When you’re writing tests, however, this is might be easiest or sometimes only way to test some code. This is useful enough, in fact, that Python provides an API for this specific use case.

Into the unknown

Normal programming stays within abstraction boundaries, and for good reasons: it’s safer, it’s less brittle, it’s where you write the business logic that ultimately makes software useful for people who aren’t programmers. But sometimes that’s not enough, sometimes you need to break those boundaries, sometimes you need to switch to transgressive programming. But crossing boundaries can be disquieting, and sometimes intimidating.

The flip side is that learning how things work underneath is a superpower. Suddenly you can do things that would have otherwise been impossible—though of course, sometimes they weren’t possible for very good reasons. Breaking across abstraction boundaries feels like you’ve learned how to do magic.

A lovely representation of both of these feelings is the song “Into the Unknown” from Frozen 2. So give it a listen, and if you’d like to learn more about what lies underneath in Python, read some more of my articles on performance and memory usage.