Why you should upgrade pip, and how to do it
New software releases can bring bug fixes, new features, and faster performance. For example, NumPy 1.20 added type annotations, and improved performance by using SIMD when possible. If you’re installing NumPy, you might want to install the newest version.
Unfortunately, if you’re using an old version of
pip, installing the latest version of a Python package might fail—or install in a slower, more complex way.
The combination of glibc versioning, the RedHat/CentOS end-of-life schedule, and how
pip installs packages.
Let’s see some symptoms so you can identify the problem, how to solve it by upgrading
pip, and finally—if you’re sufficiently interested—what causes it.
The problem with old
Let’s start out with an Ubuntu 18.04 Docker image.
Released in April 2018, this version of Ubuntu has Python version 3.6, and
pip version 9.
[itamarst@blake dev]$ docker run -it ubuntu:18.04 root@1a43d55f0524:/# apt-get update ... root@1a43d55f0524:/# apt-get install --no-install-recommends python3 python3-pip ... root@1a43d55f0524:/# pip3 --version pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
So far, so good.
Failure #1: Compiling from source
Next, let’s install the
cryptography package, which is one of the most downloaded Python packages on PyPI, with millions of downloads a month (usually as an indirect dependency).
root@1a43d55f0524:/# pip3 install cryptography Collecting cryptography Downloading https://files.pythonhosted.org/packages/fa/2d/2154d8cb773064570f48ec0b60258a4522490fcb115a6c7c9423482ca993/cryptography-3.4.6.tar.gz (546kB) 100% |################################| 552kB 1.4MB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'setuptools' ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6jesygn0/cryptography/
That didn’t go well.
That error means
pip wants us to compile the packages; this will work if we install a
setuptools, a compiler, and the Python development tool chain, but that’ll be quite slow.
Of course, this isn’t just one package. The same problem occurs with PyArrow, for example:
root@1a43d55f0524:/# pip3 install pyarrow Collecting pyarrow Downloading https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz (682kB) 100% |################################| 686kB 1.1MB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'setuptools' ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-heq6zwd7/pyarrow/
pip trying to compile these packages from scratch?
Why aren’t we getting a binary, pre-compiled package?
We’ll see the answer in a bit, after we consider the the second failure mode.
Failure #2: Old versions
Next, lets install Fil, my memory profiler for Python.
root@1a43d55f0524:/# pip3 install filprofiler Collecting filprofiler Downloading https://files.pythonhosted.org/packages/e3/a2/843e7b5f1aba27effb0146c7e564e2592bfc9344a8c8ef0d55245bd47508/filprofiler-0.7.2-cp36-cp36m-manylinux1_x86_64.whl (565kB) 100% |################################| 573kB 1.8MB/s Installing collected packages: filprofiler Successfully installed filprofiler-0.7.2
That worked! Except if you visit the PyPI page for Fil you’ll see that 0.7.2 is quite old. As I’m writing this, the latest version of Fil is 0.14.1.
Why did an old version get installed?
Many packages—from NumPy to Cryptography—require compiling some code in C/C++/Cython/Rust/etc. to work.
In order to save you the need to compile everything from scratch, maintainers can upload a compiled version of the code—”wheels”—to the Python Package Index.
pip sees a wheel that will work for your specific version of Python and operating system version, it will download it instead of the source code.
For Linux, there are multiple wheel variants:
You can see which variant is being used in the filename of the wheel you’re downloading.
Here’s the problem: old versions of
pip don’t support
manylinux2010, and certainly not
pip in Ubuntu 18.04 is too old, so it only knows about
This explains the two problems we saw:
- If you check the available files listings for PyArrow 3.0.0 on PyPI, you’ll see that there are only
piptherefore decided to fall back to the source code package, which needs compilation.
- If you check the PyPI files for Fil, you’ll see there are
manylinux2010wheels, and no source packages at all; because building from source is a little tricky, I only distribute compiled packages. That means
pipkeeps going back to older versions of the package until it finds one that has a
The solution: upgrading pip
In order to get the latest and greatest packages, without compilation, you need to upgrade to a recent version of
How you do it depends on your environment.
In general, you can do
pip install --upgrade pip and call it a day.
However, in some environments that can have issues.
For example, if you look above at how we setup Python in Ubuntu 18.04, we installed
pip from a system package.
The problem is that overwriting random files from a system package is a bad idea.
Unless you’re running inside an environment you’re happy rebuilding from scratch when necessary—like a Docker image—you should never run
pip install as root or with
sudo to modify your system packages.
So instead, on Ubuntu 18.04 you might get
pip via download:
$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py $ python3 get-pip.py
Or you can create a virtualenv, and then upgrade its
pip by doing
pip install --upgrade pip:
root@1a43d55f0524:/# python3 -m venv myvenv root@1a43d55f0524:/# . myvenv/bin/activate (myvenv) root@1a43d55f0524:/# pip --version pip 9.0.1 from /myvenv/lib/python3.6/site-packages (python 3.6) (myvenv) root@1a43d55f0524:/# pip install --upgrade pip Collecting pip Using cached https://files.pythonhosted.org/packages/fe/ef/60d7ba03b5c442309ef42e7d69959f73aacccd0d86008362a681c4698e83/pip-21.0.1-py3-none-any.whl Installing collected packages: pip Found existing installation: pip 9.0.1 Uninstalling pip-9.0.1: Successfully uninstalled pip-9.0.1 Successfully installed pip-21.0.1
Now that we have a newer
pip, we can easily install the latest versions of
(myvenv) root@1a43d55f0524:/# pip install cryptography filprofiler Collecting cryptography Downloading cryptography-3.4.6-cp36-abi3-manylinux2014_x86_64.whl (3.2 MB) |################################| 3.2 MB 4.5 MB/s ... Installing collected packages: pycparser, threadpoolctl, cffi, filprofiler, cryptography Successfully installed cffi-1.14.5 cryptography-3.4.6 filprofiler-0.14.1 pycparser-2.20 threadpoolctl-2.1.0
Notice we downloaded a
manylinux2014 package for
Why do these
manylinux variations exist?
Compiled Python extensions on Linux link against the standard C library, and in wheels in particular they link against the GNU Libc, aka glibc.
You can see which libraries an executable or shared library link against by using the
root@1a43d55f0524:/# cd myenv/lib/python3.6/site-packages root@1a43d55f0524:/# ldd cryptography/hazmat/bindings/_openssl.abi3.so linux-vdso.so.1 (0x00007ffdbea7b000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fba7b1bf000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fba7adce000) /lib64/ld-linux-x86-64.so.2 (0x00007fba7b7b0000)
Notice that the compiled Python extension relies, among others, on
/lib/x86_64-linux-gnu/libc.so.6, which is to say glibc.
If you compile your code against a newer version of glibc, it might require new APIs or symbols that aren’t available in older versions. And that means your code won’t run against older versions of glibc, i.e. on older Linux distributions.
There are a number of different solutions to this problem. Conda solves it by compiling all its packages against older version of glibc headers that it includes; basically it has a custom compilation setup designed to work on a broad range of Linux releases.
PyPI binary wheels solve this by compiling on old versions of Linux, which have correspondingly old versions of glibc. Since it’s compiled against an old version, it will work with any newer version as well.
- manylinux1 packages are built on CentOS 5.
- manylinux2010 packages are built on CentOS 6.
- manylinux2014 packages are built on CentOS 7.
Whether you’re setting up a development environment or writing your
Dockerfile, make sure you upgrade
Otherwise you’ll have a much harder time installing packages.
Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler
Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.
Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.
Speed up your Python code with skills you can use at your job
Sign up for my newsletter, and join over 7400 Python developers and data scientists learning practical tools and techniques, from Python performance to Docker packaging, with a free new article in your inbox every week.