Goodbye to Flake8 and PyLint: faster linting with Ruff

Flake8 and PyLint are commonly used, and very useful, linting tools: they can help you find potential bugs and other problems with your code, aka “lints”. But they can also be slow. And even if they’re fast on your computer, they may still be slow in your CI system (GitHub Actions, GitLab, or whatever else.)

Happily, there’s a new linter available, Ruff, which is much faster. And it supports many of the same lints, including those from many of Flake8’s plugins.

In this article we’ll get a sense of why Ruff’s extra linting speed is so useful compared to the alternatives. Specifically, we’ll cover:

  • A useful lint that isn’t in standard Flake8.
  • A speed comparison to Ruff’s implementation (preview: it’s much faster!).
  • Why seemingly fast linting on your computer might still be slow in CI.
  • The worst case for Ruff (preview: it’s ridiculously fast).
  • Why Ruff may not work for you.
  • A bonus speed-up for linting by changing tox’s configuration.

Why Flake8 isn’t enough: an example

Flake8 on its own is a great linter for catching common, basic bugs. For example:

def f(myvar):
    return myva * 2
$ flake8 F821 undefined name 'myva'

It doesn’t catch all problems, however. Consider the following program:

def make_three_adders():
    result = []
    for i in [10, 20, 30]:
        def add(x):
            return x + i
    return result

for adder in make_three_adders():

What do you think this program will product? Naively, we’d expect it to print 17, 27, and 37. In fact:

$ python

The functions we created don’t capture the current value of the variable in the for loop, they end up using the last value.

This is a great opportunity for a linter, but by default Flake8 won’t catch this:

$ flake8

There are three alternatives you can use at this point. First, Flake8 has a plugin call Bugbear that adds many additional checks, including one that will identify this bug:

$ pip install flake8-bugbear
$ flake8 B023 Function definition does not bind loop variable 'i'.

Second, PyLint can also identify the problem:

$ pylint
... W0640: Cell variable i defined in loop (cell-var-from-loop)

Finally, you can use the Ruff linter.

Given I’ve seen this cause real-world bugs, I think it’s important to check for it with one of these linters, at least.

Benchmarking three alternatives for linting closure variable capture

To get a sense of how these three linters compare in terms of speed, I measured the time to run this particular lint on a real project with 120,000 lines of Python code. I used Python 3.11, on my i7-12700K machine, with hyperthreading and turboboost disabled. And I only ran this one single lint:

$ time pylint --disable=all --enable=cell-var-from-loop src
$ time flake8 --select B023 src
$ time ruff --select B023 src

Here’s a summary of wallclock time, and CPU usage:

Tool Wallclock seconds CPU seconds
Flake8 6.0.0 w/bugbear 23.3.23 1.7 13.1
PyLint 2.17.3 14.0 14.0
Ruff 0.0.263 0.2 1.0

Notice that both Ruff and Flake8 have a CPU time that’s higher than wallclock elapsed time: that means they’re taking advantage of the fact my computer has multiple cores.

Now, it’s clear that Ruff is much faster in terms of CPU used, but elapsed time isn’t that much different between Flake8 and Ruff, given Flake8 can use all my computer’s cores. So does it really worth switching?

Why CI is so much slower than my computer (and probably yours as well)

Unfortunately, when running this lint in CI, the Flake8+Bugbear version will take much longer than on my computer when measured in elapsed wallclock time. This is for two reasons:

  • Faster cores: Most of the cores on my computer are “performance” cores that are probably twice as fast as the cores in the cloud virtual machine most CI systems uses.
  • More cores: With hyperthreading disabled, my computer has 12 cores available, assuming I’m not using them for something else. Meanwhile, the cloud VM used in CI often only has 2 vCPU, with “vCPU” being a polite way of saying hyperthreading. Which is to say, the CPU pretends it’s 2 cores but really it’s one core being shared. You’ll get a little parallelism if you’re lucky, but far less than you would get from two normal cores.

(For more on what hyperthreading means, see here.)

Essentially, we have to assume that the CI runner effectively has no parallelism, and that its CPU is pretty slow. If the table above shows the linter taking 14 CPU seconds on my computer, as a first approximation we can guess it will take 30 wallclock seconds in CI.

Your personal computer may or may not be as fast as mine, but even slower personal computers are likely to be noticeably faster than a 2 vCPU cloud VM.

Switching to Ruff: a real example

Given slow CI runners, Ruff’s huge reduction in CPU time becomes more meaningful. Recently I switched a project to Ruff; previously it was using Flake8 plus PyLint with the cell-var-from-loop check. Essentially it ran:

flake8 src
pylint pylint --disable=all --enable=cell-var-from-loop src

The linters run as a CircleCI job with a “Medium” runner, which has 2 vCPUs. Here’s the timing:

  Time to run
Original 100 sec
Ruff 43 sec
Ruff + tox.ini tweak 19 sec

Those 19 seconds are basically all overhead at this point: every time the job runs it needs to checkout the code, setup a virtualenv or two, install tox and ruff, and so on. Running ruff is so fast it doesn’t really affect the elapsed time.

A bonus speed-up: changing the tox configuration for linting

So what’s that tox.ini tweak that shaved off another 20 seconds? By default tox will install the package when it creates a new environment, in this case for linting. This can result in downloading dependencies, and it just takes a while to write out lots of small files.

But for linting, we don’t need to install the code at all. So we can use tox’s skip_install option to skip installing the package code, which saves a little bit of time.

How slow will Ruff get?

Notice that beyond the relatively fast base Flake8 setup, we only had one extra lint configured. In an ideal world we would like to add more lints; Ruff has many lints disabled by default (whether or not they’re useful depends on your situation).

So how much time will Ruff take if we enable all of its lints? In practice, many of these lints are not necessarily things everyone would want to enable, but it still gives us an upper bound on speed:

$ time ruff --select=ALL src/
Found 55840 errors.
[*] 17570 potentially fixable with the --fix option.

real    0m0.566s
user    0m1.418s
sys     0m0.172s

Even if we enabled every single lint provided by Ruff, it’s still going to run 10× faster than Flake8, and even more so than PyLint.

Why Ruff might not work for you (yet)

Ruff implements a very long list of linting rules, directly copying rules from Flake8, Flake8’s plugins, PyLint, and other tools. The PyLint compatibility is still in an earlier stage, with many lints missing; on the other hand, using PyLint is difficult and slow enough that most lints are probably disable by default anyway. That being said, it’s progressing very rapidly.

For a new project, I’d just immediately start with Ruff; for existing projects, I would strongly recommend trying it as soon as you start getting annoyed about how long linting is taking in CI (or even worse, on your computer).