NumPy 2 is coming: preventing breakage, updating your code

If you’re writing scientific or data science code with Python, there’s a good chance you’re using NumPy, directly or indirectly. Pandas, Scikit-Image, SciPy, Scikit-Learn, AstroPy… these and many other packages depend on NumPy.

NumPy 2 is a new major release, with a release candidate coming out February 1st 2024, and a final release a month or two later. Importantly, it’s backwards incompatible; not in a major way, but enough that some work might be required to upgrade. And that means you need to make sure your application doesn’t break when NumPy 2 comes out.

In this article we’ll cover:

  • The different ways the new release might break your application.
  • A quick reminder about the importance of pinning packages.
  • How to ensure your application doesn’t install NumPy 2 until you’re ready.
  • How to easily upgrade your code to support NumPy 2.

How NumPy 2 might break your application

There are three impacts a new, incompatible dependency can have on your application:

  1. Your code: If you’re using NumPy APIs directly in your application code, your code might break.
  2. Direct dependencies: The libraries you directly use in your code might be incompatible with NumPy 2.
  3. Indirect/transitive dependencies: The dependencies of the libraries you use in your code might be incompatible.

Fixing your code should be easy enough, but the libraries you depend on directly or indirectly are not under your control, and are often maintained by volunteers.

To give an example, as of Jan 9 2024, scikit-image:

  1. Is incompatible with NumPy 2.
  2. Declares in its packaging metadata that it works with numpy>=1.22.

When NumPy 2 comes out, all existing releases will claim to work with NumPy 2 but will actually be at least partially broken. With any luck the maintainers will have a new compatible release out by the time NumPy 2 comes out, but again, these are volunteers; they might not hit the deadline. And this is just one out of many libraries.

Recap: Why you need to pin dependencies

These sort of problems are one of the many reasons you want to “pin” your application’s dependencies: make sure you only install a specific, fixed set of dependencies. Without reproducible dependencies, as soon as NumPy 2 comes out your application might break when it gets installed with new dependencies.

The really short version is that you have two sets of dependency configurations:

  1. A direct dependency list: A list of libraries you directly import in your code, loosely restricted. This is the list of dependencies you put in pyproject.toml or setup.py.
  2. A lock file: A list of all dependencies you rely on, direct or indirect (dependencies of dependencies), pinned to specific versions. This might be a requirements.txt, or some other file dependencies on which tool you’re using.

At appropriate intervals you update the lock file based on the direct dependency list.

I’ve written multiple articles on the topic, in case you’re not familiar with the relevant tools:

Step 1. Ensuring NumPy 2 doesn’t get installed

Because your dependencies may take a little time to become compatible with NumPy 2, initially you probably want to stick to NumPy 1.x. That means ensuring NumPy 2 doesn’t get installed. So whether you’re using NumPy directly or indirectly, make sure you have a restrictive dependency of numpy<2 in your dependency list.

For example, if you’re using a pyproject.toml file to configure setuptools, it might look like this:

# ...

[project]
dependencies = [
    "pandas",
    # For now, make sure NumPy 2 is not installed
    "numpy<2",
]

If you’re using a setup.py, it might look like this:

from setuptools import setup

setup(
    # ...,
    install_requires=[
        "pandas",
        # For now, make sure NumPy 2 is not installed
        "numpy<2",
    ],
)

Step 2. Wait for dependencies to support NumPy 2

Eventually all the libraries you depend on will work with NumPy 2. Remember you’ll need to validate not just direct dependencies, but also indirect dependencies: go through the list of libraries in your lock file.

Step 3. Upgrade your code and dependencies

First, remove the restriction on numpy<2 you added in step 1 from your dependencies, as it will no longer be necessary.

Second, if you use NumPy directly you will need to update some code usage, as documented in the NumPy 2 migration guide.

Upgrading your code with Ruff

The migration guide explains that upgrading your code to support NumPy 2 can be automated using the Ruff linter. If you don’t use Ruff already, you probably should: it’s a much faster alternative to Flake8, PyLint, and many other tools. To install it, pip install ruff or conda install conda-forge::ruff.

Now, let’s say we have the following module:

import numpy as np

arr1 = np.array([1 + 3j, 2], dtype=np.cfloat)
arr2 = np.array([2.0, 3.0], dtype=np.float_)

We can use ruff to find incompatibilities with NumPy 2:

$ ruff check --preview --select NPY201 example.py
example.py:3:36: NPY201 [*] `np.cfloat` will be removed in NumPy 2.0. Use `numpy.complex128` instead.
example.py:4:35: NPY201 [*] `np.float_` will be removed in NumPy 2.0. Use `numpy.float64` instead.
Found 2 errors.
[*] 2 fixable with the `--fix` option.

The --preview option is necessary because this is still an unstable feature in ruff. By the time you’re ready to migrate, a few months after this article is written, this lint rules will hopefully be stable.

We can add the --fix flag to have Ruff fix the problems for us:

$ ruff check --preview --fix --select NPY201 example.py
Found 2 errors (2 fixed, 0 remaining).

And now example.py looks like this:

import numpy as np

arr1 = np.array([1 + 3j, 2], dtype=np.complex128)
arr2 = np.array([2.0, 3.0], dtype=np.float64)

Be prepared!

NumPy has been a backwards compatible 1.x for a while, but backwards incompatible changes will eventually happen to all libraries. This is why you should:

  1. Use a lockfile to pin all dependencies, direct or indirect (“transitive”) with tools like pip-tools, pipenv, poetry or conda-lock.
  2. For libraries that use semantic versioning, i.e. where the major version changes on incompatible changes, consider adding a pre-emptive version limitation like numpy<2.
  3. Make sure you update your dependencies regularly so you don’t fall behind.