Why Pylint is both useful and unusable, and how you can actually use it
This is a story about a tool that caught a production-impacting bug the day before we released the code. This is also the story of a tool no one uses, and for good reason.
By the time you’re done reading you’ll see why this tool is useful, why it’s unusable, and how you can actually use it with your Python project.
Pylint saves the day
If you’re coding in Haskell the compiler’s got your back. If you’re coding in Java the compiler will usually lend a helping hand. But if you’re coding in a dynamic language like Python you’re on your own: you don’t have a compiler to catch bugs for you.
The next best thing is a lint tool that uses heuristics to catch bugs in your code. One such tool is Pylint, and here’s how I started using it.
A few years ago at my job at the time we realized our builds had been consistently failing for a few days, and it wasn’t the usual intermittent failures caused by integration tests. After a few days of investigating, my colleague Tom Prince discovered the problem. It was Python code that looked something like this:
for volume in get_volumes(): do_something(volume) for volme in get_other_volumes(): do_something_else(volume)
Notice the typo in the second for loop. Combined with the fact that Python leaks variables from blocks, the last value of volume from the first for loop was used for every iteration of the second loop.
To see if we could prevent these problems in the future I tried Pylint, re-introduced the bug… and indeed it caught the problem. I then looked at the rest of the output to see what else it had found.
What it had found was a serious bug. It was in code I had written a few days earlier, and the bug completely broke an important feature we were going to ship to users the very next day. Here’s a heavily simplified minimal reproducer for the bug:
list_of_printers =  for i in [1, 2, 3]: def printer(): print(i) list_of_printers.append(printer) for func in list_of_printers: func()
The intended result of this reproducer is to print:
1 2 3
But what will actually get printed with this code is:
3 3 3
When you define a nested function in Python that refers to a variable in the outside scope it binds not the value of a variable but the variable itself. In this case that means the i inside printer() ended up always getting the last value of the variable i in the for loop.
And luckily Pylint caught that bug before it shipped; pretty great, right?
Why no one uses Pylint
Pylint is useful, but many projects don’t use it. For example, I went and checked just now, and neither Twisted nor Django nor Flask nor Sphinx seem to use Pylint. Why wouldn’t these large, sophisticated Python projects use a tool that would automatically catch bugs for them?
My guess: it’s the default output, which has a terrible signal to noise ratio.
Here’s what I mean: I ran
pylint on a checkout of Twisted and the resulting output was 28,000 lines of output (at which point
pylint crashed, but I’ll assume that’s fixed in newer releases).
Let me say that again: 28,000 errors or warnings.
And to be fair Twisted has a coding standard that doesn’t match the Python mainstream, but massive amounts of noise has been my experience with other projects as well.
Pylint has a lot of useful errors and warnings… but also a whole lot of highly opinionated assumptions about how your code should look. And fundamentally it treats both opinions and objective problems the same way. There’s a distinction between warnings and errors, but both useful and useless stuff is in the warning category.
W:675, 0: Class has no __init__ method (no-init)
That’s not a useful warning. Now imagine a few thousand of those.
You can run Pylint in a mode where it only complains about obvious errors—but then you’ll miss out on some of the important warnings that make Pylint so useful.
How you should use Pylint
So here we have a tool that is potentially useful, but unused in practice. What to do? Luckily Pylint has some functionality that can help: you can configure it with a whitelist of lint checks.
First, setup Pylint to do nothing:
- Make a list of all the features you plausibly want to enable from the Pylint docs and configure
.pylintrcto whitelist them.
- Comment them all out.
At this point Pylint will do no checks. Next:
- Uncomment a small batch of checks, and run
- If the resulting errors are real problems, fix them. If the errors are utter garbage, delete those checks from the configuration.
At this point you have a small number of probably useful checks that are passing: you can run
pylint and you only will be told about new problems.
In other words, you have a useful tool.
Repeat this process a few times, or once a week, enabling a new batch of checks each time until you run out of patience or you run out of Pylint checks to enable.
The end result will be something like this configuration; that project is open source under the Apache 2.0 license, so you can probably use it as a starting point.
Go forth and lint
Here’s my challenge to you: go setup Pylint on a project today. It’ll take an hour to get some minimal checks going, and one day it will save you from a production-impacting bug.
And, even better, it can help you speed up your development cycle by speeding up your feedback loop. It can speed up local development if you configure your editor to use it, and you can also use it to speed up the feedback loop of your test suite.
Your team is so bored waiting for tests to finish, you’ve all started mock-swordfighting on office chairs.
Wouldn’t you rather spend your time writing code and shipping features?
I’ve got 10+ years’ experience improving Python test suites: get in touch now to see how I can help.
Tests too slow? You can make them faster—
—by signing up to get practical Python performance tips in your inbox every week, based on my 19+ years of Python experience. Learn the tools and skills you need to speed up your application and your test suite: