Why you really need to upgrade pip

New software releases can bring bug fixes, new features, and faster performance. For example, NumPy 1.20 added type annotations, and improved performance by using SIMD when possible. If you’re installing NumPy, you might want to install the newest version.

Unfortunately, if you’re using an old version of pip, installing the latest version of a Python package might fail—or install in a slower, more complex way.

Why? The combination of glibc versioning, the CentOS end-of-life schedule, and how pip installs packages.

Let’s see what the problem is exactly, how to solve it, and finally—if you’re sufficiently interested—what causes it.

The problem with old pip

Let’s start out with an Ubuntu 18.04 Docker image. Released in April 2018, this version of Ubuntu has Python version 3.6, and pip version 9.

[itamarst@blake dev]$ docker run -it ubuntu:18.04
root@1a43d55f0524:/# apt-get update
...
root@1a43d55f0524:/# apt-get install --no-install-recommends python3 python3-pip
...
root@1a43d55f0524:/# pip3 --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

So far, so good.

Failure #1: Compiling from source

Next, let’s install the cryptography package, which is one of the most downloaded Python packages on PyPI, with millions of downloads a month (usually as an indirect dependency).

root@1a43d55f0524:/# pip3 install cryptography
Collecting cryptography
  Downloading https://files.pythonhosted.org/packages/fa/2d/2154d8cb773064570f48ec0b60258a4522490fcb115a6c7c9423482ca993/cryptography-3.4.6.tar.gz (546kB)
    100% |################################| 552kB 1.4MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'setuptools'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6jesygn0/cryptography/

That didn’t go well.

That error means pip wants us to compile the packages; this will work if we install a setuptools, a compiler, and the Python development tool chain, but that’ll be quite slow.

Of course, this isn’t just one package. The same problem occurs with PyArrow, for example:

root@1a43d55f0524:/# pip3 install pyarrow
Collecting pyarrow
  Downloading https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz (682kB)
    100% |################################| 686kB 1.1MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'setuptools'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-heq6zwd7/pyarrow/

Why is pip trying to compile these packages from scratch? Why aren’t we getting a binary, pre-compiled package?

We’ll see the answer in a bit, after we consider the the second failure mode.

Failure #2: Old versions

Next, lets install Fil, my memory profiler for Python.

root@1a43d55f0524:/# pip3 install filprofiler
Collecting filprofiler
  Downloading https://files.pythonhosted.org/packages/e3/a2/843e7b5f1aba27effb0146c7e564e2592bfc9344a8c8ef0d55245bd47508/filprofiler-0.7.2-cp36-cp36m-manylinux1_x86_64.whl (565kB)
    100% |################################| 573kB 1.8MB/s 
Installing collected packages: filprofiler
Successfully installed filprofiler-0.7.2

That worked! Except if you visit the PyPI page for Fil you’ll see that 0.7.2 is quite old. As I’m writing this, the latest version of Fil is 0.14.1.

Why did an old version get installed?

pip and manylinux wheels

Many packages—from NumPy to Cryptography—require compiling some code in C/C++/Cython/Rust/etc. to work. In order to save you the need to compile everything from scratch, maintainers can upload a compiled version of the code—”wheels”—to the Python Package Index. If pip sees a wheel that will work for your specific version of Python and operating system version, it will download it instead of the source code.

For Linux, there are multiple wheel variants: manylinux1, manylinux2010, and manylinux2014. You can see which variant is being used in the filename of the wheel you’re downloading.

Here’s the problem: old versions of pip don’t support manylinux2010, and certainly not manylinux2014. The pip in Ubuntu 18.04 is too old, so it only knows about manylinux1. This explains the two problems we saw:

  1. If you check the available files listings for PyArrow 3.0.0 on PyPI, you’ll see that there are only manylinux2010 and manylinux2014 wheels. pip therefore decided to fall back to the source code package, which needs compilation.
  2. If you check the PyPI files for Fil, you’ll see there are manylinux2010 wheels, and no source packages at all; because building from source is a little tricky, I only distribute compiled packages. That means pip keeps going back to older versions of the package until it finds one that has a manylinux1 wheel available.

The solution: upgrading pip

In order to get the latest and greatest packages, without compilation, you need to upgrade to a recent version of pip. How you do it depends on your environment.

In general, you can do pip install --upgrade pip and call it a day.

However, in some environments that can have issues. For example, if you look above at how we setup Python in Ubuntu 18.04, we installed pip from a system package.

The problem is that overwriting random files from a system package is a bad idea. Unless you’re running inside an environment you’re happy rebuilding from scratch when necessary—like a Docker image—you should never run pip install as root or with sudo to modify your system packages.

So instead, on Ubuntu 18.04 you might get pip via download:

$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
$ python3 get-pip.py

Or you can create a virtualenv, and then upgrade its pip by doing pip install --upgrade pip:

root@1a43d55f0524:/# python3 -m venv myvenv
root@1a43d55f0524:/# . myvenv/bin/activate
(myvenv) root@1a43d55f0524:/# pip --version
pip 9.0.1 from /myvenv/lib/python3.6/site-packages (python 3.6)
(myvenv) root@1a43d55f0524:/# pip install --upgrade pip
Collecting pip
  Using cached https://files.pythonhosted.org/packages/fe/ef/60d7ba03b5c442309ef42e7d69959f73aacccd0d86008362a681c4698e83/pip-21.0.1-py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-21.0.1

Now that we have a newer pip, we can easily install the latest versions of cryptography and filprofiler:

(myvenv) root@1a43d55f0524:/# pip install cryptography filprofiler
Collecting cryptography
  Downloading cryptography-3.4.6-cp36-abi3-manylinux2014_x86_64.whl (3.2 MB)
     |################################| 3.2 MB 4.5 MB/s 
...
Installing collected packages: pycparser, threadpoolctl, cffi, filprofiler, cryptography
Successfully installed cffi-1.14.5 cryptography-3.4.6 filprofiler-0.14.1 pycparser-2.20 threadpoolctl-2.1.0

Notice we downloaded a manylinux2014 package for cryptography.

Why do these manylinux variations exist?

Compiled Python extensions on Linux link against the standard C library, and in wheels in particular they link against the GNU Libc, aka glibc. You can see which libraries an executable or shared library link against by using the ldd utility:

root@1a43d55f0524:/# cd myenv/lib/python3.6/site-packages
root@1a43d55f0524:/# ldd cryptography/hazmat/bindings/_openssl.abi3.so 
        linux-vdso.so.1 (0x00007ffdbea7b000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fba7b1bf000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fba7adce000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fba7b7b0000)

Notice that the compiled Python extension relies, among others, on /lib/x86_64-linux-gnu/libc.so.6, which is to say glibc.

If you compile your code against a newer version of glibc, it might require new APIs or symbols that aren’t available in older versions. And that means your code won’t run against older versions of glibc, i.e. on older Linux distributions.

There are a number of different solutions to this problem. Conda solves it by compiling all its packages against older version of glibc headers that it includes; basically it has a custom compilation setup designed to work on a broad range of Linux releases.

PyPI binary wheels solve this by compiling on old versions of Linux, which have correspondingly old versions of glibc. Since it’s compiled against an old version, it will work with any newer version as well.

  • manylinux1 packages are built on CentOS 5.
  • manylinux2010 packages are built on CentOS 6.
  • manylinux2014 packages are built on CentOS 7.

The motivation for each new variant is the end-of-life of each version of CentOS. And each new variant requires a corresponding new release of pip. You can learn more in PEP-571 and PEP-599.

Upgrade your pip!

Whether you’re setting up a development environment or writing your Dockerfile, make sure you upgrade pip. Otherwise you’ll have a much harder time installing packages.