Some reasons to avoid Cython

by Itamar Turner-Trauring
Last updated 22 Jan 2025, originally created 18 Jan 2023

If you need to speed up Python, Cython is a very useful tool. It lets you seamlessly merge Python syntax with calls into C or C++ code, making it easy to write high-performance extensions with rich Python interfaces.

That being said, Cython is not the best tool in all circumstances. So in this article I’ll go over some of the limitations and problems with Cython, and suggest some alternatives.

A quick overview of Cython

In case you’re not familiar with Cython, here’s a quick example; technically Cython has pre-defined malloc and free but I included them explicitly for clarity:

cdef extern from "stdlib.h":
    void *malloc(size_t size);
    void free(void *ptr);

cdef struct Point:
    double x, y

cdef class PointVec:
    cdef Point* vec
    cdef int length

    def __init__(self, points: list[tuple[float, float]]):
        self.vec = <Point*>malloc(
            sizeof(Point) * len(points))
        self.length = len(points)

        for i, (x, y) in enumerate(points):
           self.vec[i].x = x
           self.vec[i].y = y

    def __repr__(self):
        result = []
        for i in range(self.length):
            p = self.vec[i]
            result.append("({}, {})".format(p.x, p.y))
        return "PointVec([{}])".format(", ".join(result))

    def __setitem__(
        self, index, point: tuple[float, float]
    ):
        x, y = point
        if index > self.length - 1:
            raise IndexError("Index too large")
        self.vec[index].x = x
        self.vec[index].y = y

    def __getitem__(self, index):
        cdef Point p
        if index > self.length - 1:
            raise IndexError("Index too large")
        p = self.vec[index]
        return (p.x, p.y)

    def __dealloc__(self):
        free(self.vec)

We’re writing Python—but any point we can just call into C code, and interact with C variables, C pointers, and other C features. When you’re not interacting with Python objects, it’s straight C code, with all the corresponding speed.

Typically you’d add compilation to your setup.py, but for testing purposes we can just use the cythonize tool:

$ cythonize -i pointvec.pyx
...
$ python
>>> from pointvec import PointVec
>>> pv = PointVec([(1, 2), (3.5, 4)])
>>> pv
PointVec([(1.0, 2.0), (3.5, 4.0)])
>>> pv[1] = (3, 5)
>>> pv[1]
(3.0, 5.0)
>>> pv
PointVec([(1.0, 2.0), (3.0, 5.0)])

How Cython works

Cython compiles the pyx file to C, or C++, which then gets compiled normally to a Python extension. In this case, it generates 197KB of C code!

As you can imagine, reading the resulting C code is not fun; here’s a tiny excerpt:

  /* "pointvec.pyx":16
 *         self.length = len(points)
 * 
 *         for i, (x, y) in enumerate(points):             # <<<<<<<<<<<<<<
 *            self.vec[i].x = x
 *            self.vec[i].y = y
 */
  __Pyx_INCREF(__pyx_int_0);
  __pyx_t_2 = __pyx_int_0;
  if (likely(PyList_CheckExact(__pyx_v_points)) || PyTuple_CheckExact(__pyx_v_points)) {
    __pyx_t_3 = __pyx_v_points; __Pyx_INCREF(__pyx_t_3); __pyx_t_1 = 0;
    __pyx_t_4 = NULL;
  } else {
    __pyx_t_1 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_v_points); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 16, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = Py_TYPE(__pyx_t_3)->tp_iternext; if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 16, __pyx_L1_error)
  }

This part isn’t too bad. Later parts of the code are much harder to read.

You can however use the annotation option available in various parts of the Cython toolchain to get a HTML file that maps Cython code to generated C code, to help match up the input and output.

Why Cython is so attractive

As our example shows, creating a small extension for Python is very easy with Cython. You get to use Python syntax to interact with Python, but you can also write code that compiles one-to-one with C or C++, so you can have fast code easily interoperating with Python.

Some downsides to Cython

Unfortunately, since Cython is in the end just a thin layer over C or C++, it inherits all the problems that those languages suffer from. And then it adds some more problems of its own.

Problem #1: Memory unsafety

Go look over the PointVec example above. Can you spot the memory safety bug?

Click here to see the answer

While the __setitem__ and __getitem__ methods check for indexes that are too high, they don’t check for negative numbers. We can therefore write (and read) to memory addresses outside allocated memory:

>>> pv[-200000] = (1, 2)
Segmentation fault (core dumped)

This would likely allow an attacker to take over the process if they could feed in the right inputs.

Most security bugs in the wild are due to memory unsafety, and using C and C++ makes it far too easy to introduce these bugs. Cython inherits this problem, which means it’s very difficult to write secure code with Cython. And even if security isn’t a concern, memory corruption bugs are a pain to debug.

Bonus bug: The previous bug was intentional, but Alex Gaynor pointed out there’s another bug I introduced accidentally. Can you spot it?

Click here to see the answer

sizeof(Point) * len(points) can overflow. This is probably harder to exploit, but the fact it’s so easy to introduce security bugs is really not good.

Problem #2: Two compiler passes

When you compile a Cython extension, it first gets compiled to C or C++, and then a second pass of compilation happens with a C or C++ compiler. Some bugs will only get caught in the second compilation pass, after Cython has generated thousands of lines of hard-to-decipher code. The resulting errors can be annoying:

Here’s data.h:

#include <stdio.h>

struct X {
    double* myvalue;
};

static inline void print_x(struct X x) {
    printf("%f\n", *x.myvalue);
}

Here’s typo.pyx, which has a typo (can you spot it?):

cdef extern from "data.h":
    cdef struct X:
        double myvalue
    void print_x(X x)

def go():
    x = X()
    x.myvalue = 123
    print_x(x)

When I compile typo.pyx, I get the following error:

typo.c: In function ‘__pyx_pf_4typo_go’:
typo.c:2174:23: error: incompatible types when assigning to type ‘double *’ from type ‘double’
 2174 |   __pyx_v_x.myvalue = 123.0;
      |                       ^~~~~

Notice there’s no reference to the original location in the .pyx source code. In this case it’s pretty clear what’s going on, since our example only has the one assignment. With more complex code it takes more work; we can use the annotated report mentioned above, but that goes in the opposite direction so it’s going to take some clicking/searching. With C++ this can get even more frustrating, since the language is more complex and therefore has more ways to fail.

For more experienced developers, this is less of an issue, but a significant benefit of good compiler errors is helping new developers.

Problem #3: No standardized package or build system for dependencies

Once your Cython code base gets big enough, you might want to add some functionality without having to write it yourself. If you’re using C++, you have access to the C++ standard library, including its data structures. Beyond that, you’re in the land of C and C++, which for practical purposes has no package manager for libraries.

With Python, you can pip install a dependency, or add it to your dependency file with poetry or pipenv. With Rust, you can cargo add a dependency. With C and C++ you’ve got no language-specific tooling.

That means on Linux you can get your Linux distribution’s version of popular libraries… but there’s apt and dnf and more. macOS has Brew, Windows has its own, much smaller repositories like Choco. But every platform is different, and many libraries simply won’t be packaged for you. And then once you’ve gotten your C or C++ library downloaded, you might be dealing with a custom build system.

In short, unless you’re just wrapping an existing library, all the incentives push you to write everything from scratch in Cython, rather than reuse preexisting libraries.

Problem #4: Lack of tooling (partially solved by Cython 3)

Because of the small user base, and the complexity of how Cython works, it doesn’t have as much tooling as other languages. For example, most editors these days can use LSP language servers to get syntax checking and other IDE functionality, but there is no such language server for Cython as far as I know, though some editors do have plugins.

Cython 3 has a new, Python-compatible syntax, which means you can apply Python tooling more readily. And there is a Cython linter available.

Problem #5: Python-only

Using Cython locks you in to a Python-only world: any code you write is only really helpful to someone writing Python. This is a shame, because people in other ecosystems might benefit from this code as well. For example, the Polars DataFrame library can be used from Python, but also from Rust (the language it’s written in), JavaScript, and work is in progress for R.

Alternatives to Cython

So what can you use instead of Cython?

If you’re wrapping an existing C library, Cython is still a good choice. Mostly you just need to interface C to Python, exactly what Cython excels at—and you’re already dealing with a memory-unsafe language.
If you’re wrapping an existing C++ library, a native C++/Python library like pybind11 or the faster nanobind may give a more pleasant development experience.
If you are writing a small, standalone extension, and you are certain security will never a be a concern, Cython may still be a reasonable choice if you already know how to use it.

On the other hand, if you expect you’ll be writing extensive amounts of new code, you’ll want something better. My suggestion: Rust.

Note: Whether or not any particular tool or technique will speed things up depends on where the bottlenecks are in your software.

Need to identify the performance and memory bottlenecks in your own Python data processing code? Try the Sciagraph profiler, with support for profiling both in development and production on macOS and Linux, and with built-in Jupyter support.

Rust as an alternative

Rust is a memory-safe, high-performance language, and allows you to easily write Python extensions with PyO3. For simple cases, packaging is extra easy with Maturin, otherwise you can use setuptools-rust. You can also easily work with NumPy arrays.

Additionally, Rust overcomes all the other Cython problems mentioned above:

Memory safety: Rust is designed to be memory-safe by default, while still having the same performance as C or C++.
One compiler pass: Unlike Cython, there’s just the one compiler.
Integrated package repository and build system: Rust has a growing ecosystem of libraries, and a package and build manager called Cargo. Adding dependencies is quick, easy, and reproducible.
Lots of tooling: Rust has a linter called clippy, an excellent LSP server, an autoformatter, and so on.
Cross-language: A Rust library can be wrapped in Python, but you can also interoperate with other languages.

The downsides, compared to Cython:

You can’t use Python syntax inline, so interfacing with Python is more work.
It’s a much more complex language than C, so it takes much longer to learn, though it’s no worse than C++.

Some real-world examples

Polars: We’ve already mentioned Polars is written in Rust; the Python version is just a wrapper around a generic Rust library.

Py-Spy: The py-spy profiler is written in Rust, and by its nature is very Python-specific. However, it shares some generic dependencies with the rb-spy Ruby profiler, which uses the same operating-system mechanisms. And beyond that it uses many other pre-existing Rust libraries—that’s the benefit of using a language with a package manager and an active open source ecosystem.

Reworking our example in Rust

So what does Rust look like compared to Cython?

I rewrote the PointVec example in Rust, occasionally using slightly less idiomatic code for a bit more clarity. When formatted with Rust’s autoformatter, the result is 55 lines of code, compared to 42 for Cython:

use pyo3::exceptions::PyIndexError;
use pyo3::prelude::*;

struct Point {
    x: f64,
    y: f64,
}

#[pyclass]
struct PointVec {
    vec: Vec<Point>,
}

#[pymethods]
impl PointVec {
    #[new]
    fn new(points: Vec<(f64, f64)>) -> Self {
        Self {
            vec: points.into_iter().map(
                |(x, y)| Point { x, y }).collect(),
        }
    }

    fn __getitem__(
        &self, index: usize
    ) -> PyResult<(f64, f64)> {
        if self.vec.len() <= index {
            return Err(PyIndexError::new_err(
                "Index out of bounds"));
        }
        return Ok((self.vec[index].x, self.vec[index].y));
    }

    fn __setitem__(
        &mut self, index: usize, t: (f64, f64)
    ) -> PyResult<()> {
        let (x, y) = t;
        if self.vec.len() <= index {
            return Err(PyIndexError::new_err(
                "Index out of bounds"));
        }
        self.vec[index] = Point { x, y };
        return Ok(());
    }

    fn __repr__(&self) -> String {
        return format!(
            "PointVec[{}]",
            self.vec
                .iter()
                .map(|t| format!("({}, {})", t.x, t.y))
                .collect::<Vec<String>>()
                .join(", ")
        );
    }
}

#[pymodule]
fn rust_pointvec(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<PointVec>()?;
    return Ok(())
}

The new version has the same functionality as the Cython one, but without the memory safety bug; the requirement for explicit typing forces us to notice that positive integers are likely what we want.

>>> from rust_pointvec import PointVec
>>> pv = PointVec([(1, 2), (3.5, 4)])
>>> pv
PointVec[(1, 2), (3.5, 4)]
>>> pv[0] = (17, 18)
>>> pv[0]
(17.0, 18.0)
>>> pv
PointVec[(17, 18), (3.5, 4)]
>>> pv[-200000] = (12, 15)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: can't convert negative int to unsigned

What happens if we omit the bounds check, like this?

    fn __setitem__(
        &mut self, index: usize, t: (f64, f64)
    ) -> PyResult<()> {
        let (x, y) = t;
        // if self.vec.len() <= index {
        //     return Err(PyIndexError::new_err(
        //         "Index out of bounds"));
        // }
        self.vec[index] = Point { x, y };
        return Ok(());
    }

Rust still protects us:

>>> from rust_pointvec import PointVec
>>> pv = PointVec([(1, 2), (3.5, 4)])
>>> pv[200000] = (12, 15)
thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 200000', src/lib.rs:35:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: index out of bounds: the len is 2 but the index is 200000

Don’t back yourself into a corner

If you’re writing a small extension and security is not a concern, Cython may be a fine choice. Nonetheless, it’s worth looking ahead and thinking about the scope of your project.

If you expect your codebase to grow significantly, it’s probably worth the investment to start with a better language from the start. You don’t want to start hitting the limits of Cython after you’ve written a whole pile of code.

Learning Rust will take more work. But in return your code will be more maintainable, because you will have access to a wide variety of libraries, far better tooling, and far fewer security concerns.

Finally, it’s worth noting that the original author of Polars didn’t write the JavaScript bindings, someone else did. If you’re writing an open source library, using a non-Python-specific language for the core implementation allows non-Python programmers access to the code, without necessarily adding extra work on your part.

Find performance and memory bottlenecks in your data processing code with the Sciagraph profiler

Slow-running jobs waste your time during development, impede your users, and increase your compute costs. Speed up your code and you’ll iterate faster, have happier users, and stick to your budget—but first you need to identify the cause of the problem.

Find performance bottlenecks and memory hogs in your data science Python jobs with the Sciagraph profiler. Profile in development and production, with multiprocessing support, on macOS and Linux, with built-in support for Jupyter notebooks.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.