The problem with float32: you only get 16 million values

by Itamar Turner-Trauring
Last updated 01 Feb 2023, originally created 27 Jan 2023

Libraries like NumPy and Pandas let you switch data types, which allows you to reduce memory usage. Switching from numpy.float64 (“double-precision” or 64-bit floats) to numpy.float32 (“single-precision” or 32-bit floats) cuts memory usage in half. But it does so at a cost: float32 can only store a much smaller range of numbers, with less precision.

So if you want to save memory, how do you use float32 without distorting your results? Let’s find out!

In particular, we will:

Explore the surprisingly low limits on the range of values that float32 lets you express.
Discuss a couple of different ways to solve the problem using basic arithmetic.
Suggest a different solution to reducing memory, which gives you an even bigger range than float32.

What value range does a float give you?

A floating point number with a given number of bits has three parts:

One bit determines whether it’s positive or negative.
Most of the bits (the “significand” or “mantissa”) allow to express a range of values at a specific precision level.
The remaining bits (the “exponent”) determine the smallest expressible difference between two consecutive mantissa values.

For a 32-bit float, we have 1 sign bit, 23 bits used to determine how many distinct values you have for a given level of precision, and 8 bits for the exponent. In practice a little trickery in the encoding is used to give 24 bits of range. That means that for a given level of precision, 32-bit floats only give you 2²⁴ = 16777216 positive values, and the same number of negative values, with 0 at the center.

For example, if you want to be able to express as many integers as possible, with a precision of 1, you can express the numbers -16777215 to 16777215:

>>> arr = np.arange(0, 16777216, dtype=np.int64)
>>> arr[-4:]
array([16777212, 16777213, 16777214, 16777215])
>>> arr[-4:].astype(np.float32)
array([16777212., 16777213., 16777214., 16777215.], dtype=float32)

You can’t express fractions in between 16777215.0 and 16777214.0 though:

>>> np.float32(16777214.5)
16777214.0

And if you go higher, you don’t even have the ability to express all the whole numbers:

>>> np.float32(16777217)
16777216.0

What if you want to be able to express both whole numbers and half numbers? You’re still limited to 2²⁴ positive values and same number of negative values, centered around 0. The number of values at a given level of precision cannot be changed! As a result, the highest value is cut in half to 2²³, to make room for having twice as many values (wholes and halves).

Within this range, wholes and halves are expressible:

>>> arr = np.arange(0, 8388608, 0.5, dtype=np.float64)
>>> arr[-4:]
array([8388606. , 8388606.5, 8388607. , 8388607.5])
>>> arr[-4:].astype(np.float32)
array([8388606. , 8388606.5, 8388607. , 8388607.5], dtype=float32)

If we go outside that range we can no longer reliably get half-values:

>>> np.float32(8388608.5)
8388608.0

If you want to be able to express quarters, halves, and whole numbers, you’re limited to a range between around 4 million and -4 million. If you add eighths, you’re limited to ~2 million to ~-2 million. And so on, until you hit the smallest level of precision expressible by the exponent.

Note: My examples above have same off-by-one errors on ranges because the next value up can be expressed with a different exponent. I’m not bothering to be that accurate since you probably don’t want to rely on exact values at the top of the range, since it’ll be easy to slip over and start losing precision. If you want the details and math involved, here’s the Wikipedia page.

A heuristic: floats as a range

The short version of the above is that 32-bit floats at a given level of precision can express 16 million positive and 16 million negative values, centered around zero. You pick your level of precision (whole numbers, halves, quarters, thousands, and so on), and that gives you the maximum value you can accurately express.

In contrast, 64-bit floats give you 2⁵³ = ~9,000,000,000,000,000 values. This is a much larger number than 16 million.

So how do you fit float64s into float32s without losing precision? By transforming the data so it has a range of at most 16 million (centered around zero!) for a given level of precision.

Unfortunately, mathematical transformations will lose increasing amount of information when you have values at the top of the range. For example, the average of 2 and 1 as float32s is 1.5:

>>> (np.float32(2) + np.float32(1)) / np.float32(2)
1.5

But the average of 16000002 and 16000001 as float32s is not 16000001.5:

>>> (np.float32(16_000_002) + np.float32(16_000_001)) / (
...     np.float32(2)
... )
16000002.0

Thus in practice you might want to use a range of much less than 16 million for the data itself. This will ensure any calculations you do give reasonable results.

How do we shrink the range of the data? Often, subtraction and division are enough to do the trick.

Centering the data with subtraction

Consider a series of timestamps, starting right about the time I wrote this article; we only care about millisecond resolution. I’m using list() to make reading the numbers a little easier:

>>> list(timestamps)
[1674837025.812, 1674837027.298, 1674837028.45]

These numbers are much larger than 16 million, so if we store them in a float32 we will lose quite a lot of precision:

>>> list(np.array(timestamps, dtype=np.float32))
[1674837000.0, 1674837000.0, 1674837000.0]

So how do we limit the data range? For many timeseries use cases, we don’t care about the absolute time, we care about the time relative to the start. So we can just subtract the starting time, and we now still have millisecond precision, while fitting in a float32.

>>> timestamps -= timestamps[0]
>>> list(np.array(timestamps, dtype=np.float32))
[0.0, 1.4860001, 2.638]

What’s the largest value we can store? As always, we can only store about 16 million positive numbers at a given precision. With millisecond precision, the largest value we can express is about 16000 seconds, or at most 4.4 hours later than the start time of 0.0.

Shrinking the data with division

Let’s look at a different example, financial data for companies in the S&P 500. The smallest company by market capitalization, News Corp, had revenue of about US$10,361,000,000 in the last 12 months. And the largest company, Apple, had $394,328,000,000.

Neither of those numbers will fit in a float32 if we want a precision of $1: we only have 16 million values at that precision. So we can just divide by a million, and then just keep in mind that the values we’re manipulating are millions:

>>> np.float32(10_361_000_000 / 1_000_000)
10361.0

Will our data fit? We only have 16 million values at a $1,000,000 precision, so the maximum value we can represent accurately at this level of precision is $16 trillion, or slightly less than the 2022 US GDP of about $25 trillion. No company is likely to have anywhere near as large that number on its balance sheet or income statement, so we should be fine.

Just using `float32`

Continuing the same example, we only have 16 million values at a precision of $1. But our data isn’t at a precision of $1. Looking at Apple’s annual report, for example, the financial data is only given at a resolution of $1,000,000. And as it turns out, float32s can represent 16 million different values at a precision of $1,000,000 just fine:

>>> values = np.arange(0, 16_000_000_000_000, step=1_000_000)
>>> values[-5:]
array([15999995000000, 15999996000000, 15999997000000,
       15999998000000, 15999999000000])
>>> values[-5:].astype(np.float32)
array([1.5999995e+13, 1.5999996e+13, 1.5999997e+13,
       1.5999998e+13, 1.5999999e+13], dtype=float32)

So in this case we don’t have to do anything special at all: float32 works just fine.

A different approach: `int32`

For a given level of precision, float32 limits us to 16 million positive values, and the same number of negative values. Sometimes this is fine, sometimes it’s too restrictive. Consider our timestamp example: we were limited to at most 4.4 hours of timestamps.

Floats are nice when you want to be able to store data at very different scales in the same datatype: you can store 0.125, but also 7 * 2²⁴. However, our timestamps don’t span different scales, they are all at the same scale. For that sort of data, int32 might be a better option.

Where float32 can store up to 16 million different positive values for a given precision, an int32 can store up to 2147 million different positive values. To make it easier to remember, we can just say 2,000 million. The number is so much larger because there are no bits spent on an exponent to adjust the scale, so we have more bits to express values.

Doing math with integers is a little different than floats, but for many use cases it won’t matter. For our timestamp example, we can say each positive integer is 1/10ths of a milliseconds from the start. That gives us about 200 million milliseconds at most; we can express timestamps as high as 55 hours after the start before we run out of range in the datatype:

>>> timestamps -= timestamps[0]
>>> list(np.array((timestamps) * 10_000, dtype=np.int32))
[0, 14860, 26380]

If we’re OK making it impossible to express anything below a millisecond, we can actually express a timestamp of as much as 550 hours after the start before hitting the limits of int32:

>>> timestamps -= timestamps[0]
>>> list(np.array((timestamps) * 1_000, dtype=np.int32))
[0, 1486, 2638]

The three things to remember

float32s let you express 16 million positive values (and the same number of negative values) at a given level of precision, centered around zero. Precision levels might be whole numbers, 1/32ths, thousands, or millions.
If you’re using the full 16 million value range, you won’t be able to express higher precision values. If you’re doing additional mathematical transformations you’ll want your data range to be sufficiently smaller to compensate. For example, to get results down to 1/16th of your input data’s precision, your data range has to be 1 million positive values when using float32.
int32 lets you express 2000 million positive values, though it has no concept of precision. If you prep your data appropriately, this can let you express a wider range of values than float32, as long as they are at the same scale.

Learn even more techniques for reducing memory usage—read the rest of the Larger-than-memory datasets guide for Python.

Consulting services: take your code from prototype to production

You have a working Python prototype for your data processing algorithm. Now you need to get it ready for production. Which means your software needs to be fast, robust, maintainable, cost-efficient, and scalable.

With more than 25 years experience of shipping software to production, I can help you:

Speed up your code so it can get results on time, and run at scale with an affordable operating budget.

Learn about tools, techniques, and process improvements that will help you ship best-practices software, on schedule.

To get in touch about consulting services, send me an email at itamar@pythonspeed.com.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.

The problem with float32: you only get 16 million values

What value range does a float give you?

A heuristic: floats as a range

Centering the data with subtraction

Shrinking the data with division

Just using float32

A different approach: int32

The three things to remember

Learn even more techniques for reducing memory usage—read the rest of the Larger-than-memory datasets guide for Python.

Consulting services: take your code from prototype to production

Speed up your Python code and learn skills you can use at your job

Just using `float32`

A different approach: `int32`