uint64 converted silently to float64 when adding an int #5745

nasailja · 2015-04-04T00:06:55Z

This code

import numpy
a = numpy.zeros(1, dtype = numpy.uint64)[0]
print(type(a))
i = 1
print(type(i))
a += i
print(type(a))

prints

<class 'numpy.uint64'>
<class 'int'>
<class 'numpy.float64'>

which was a big surprise for me. Why would adding an integer to uint64 result in a floating point value?

The text was updated successfully, but these errors were encountered:

charris · 2015-04-04T01:17:07Z

Because integer is signed and uint64 is not, so the common type needs more precision than uint64. Using float64 is admittedly a compromise since some precision is still lost. The += operator here is also between two (immutable) scalars, so it is not actually done inplace, rather, Python adds the two numbers and assigns the result to a.

nasailja · 2015-04-04T02:02:40Z

Hmm 0x123...def is only about 2**61 which seems to fit into a regular int:

>>> 0x1234567890abcdef
1311768467294899695
>>> 2**60
1152921504606846976
>>> 2**61
2305843009213693952
>>> 2**62
4611686018427387904
>>> 2**63
9223372036854775808L

so 64 bits should be enough whether the number is signed or not. At least for me a regular integer is able to represent all positive values that are supposed to fit into it (on x86_64):

>>> 0x7fffffffffffffff
9223372036854775807
>>> 0x8000000000000000
9223372036854775808L

This behavior also seems odd:

>>> numpy.uint64(0x7ffffffffffffffe) - 2**63
-2L
>>> numpy.uint64(0x7ffffffffffffffe) - 2**62
4.6116860184273879e+18
>>> numpy.uint64(0x7ffffffffffffffe) - 2.0**63
0.0

I guess 0x7f...f rounds to the same value as 2**63 when converted to floating point, so the case of adding to a standard floating point value makes sense but what is happening with 2**62 and how can that be considered not a bug?

nasailja · 2015-04-04T02:07:11Z

Even better demonstration that numpy is doing something strange:

>>> 2**62+(2**62-1)
9223372036854775807
>>> numpy.uint64(2**62)+(2**62-1)
9.2233720368547758e+18

nasailja · 2015-04-04T02:11:14Z

Looks like a bug in uint64 as int64 works correctly:

>>> numpy.int64(2**62)+(2**62-1)
9223372036854775807
>>> numpy.uint64(2**62)+(2**62-1)
9.2233720368547758e+18

nasailja · 2015-04-04T02:23:33Z

With respect to converting to floating point, I'd have preferred int64's behavior:

>>> numpy.int64(2**62)+(2**62)
__main__:1: RuntimeWarning: overflow encountered in long_scalars
-9223372036854775808
>>> numpy.uint64(2**62)+(2**62)
9.2233720368547758e+18

even though it shouldn't be needed for uint64 above. It seems to always want to convert to floating point:

>>> numpy.uint64(1) + 1
2.0

I'm using Python 3.4.3 (default, Mar 10 2015, 14:38:54) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin and numpy.version.version '1.9.2' from macports.

charris · 2015-04-04T02:24:23Z

The difference is that in the first example both numbers are signed. For mixed signed and unsigned types the result is the smallest signed type that can hold both. For example

In [1]: type(uint32(1) + 1)
Out[1]: numpy.int64

If you need to have uint64 type result, do

In [5]: type(uint64(2**62)+ uint64((2**62-1)))
Out[5]: numpy.uint64

One can argue that numpy should check the actual values, but then the result becomes less predictable.

nasailja · 2015-04-04T02:34:52Z

Kind of makes sense but it still doesn't feel consistent that uint64 "increases" the size of the variable even when there's no need but int64 doesn't even when there is. Especially as uint64's current behavior can lead to subtle bugs that might be difficult to find: #5746 If I recall correctly signed integer overflow is undefined in C but unsigned integer overflow is defined and widely used which makes numpy's behavior even more confusing.

charris · 2015-04-04T02:54:46Z

Which is not to say I don't agree with you that a float result is weird, I was never happy with that myself. ISTR that at some point the result of adding arrays of uint64 and int64 was of object type, but I may be mistaken there.

njsmith · 2015-04-04T03:19:27Z

None of numpy's conversion rules look at the actual values, only at their
type. This can be confusing in these kinds of micro tests but is an
important property to make sure that code works consistently and that what
you get when trying stuff at the prompt like this matches what you'll get
with larger days later where you aren't checking every value.

Silently casting uint64 to float64 is pretty weird though -- it breaks all
the usual rules for which casts are allowed. (It changes kinds and loses
precision, so by the rules we use everywhere else it's actually worse
than e.g. silently casting float64 to float32.) Bleh.

I guess we could stretch the scalar kind rule to say that positive python
ints are ambiguous between unsigned and signed, sort of meld two weird
special cases into one? ints are already very weird since in py3 they're
now always infinite precision, but we still treat them as int32 or int64
depending on platform. This wouldn't fix all the weirdness -- int64 +
uint64 would still be weird -- but at least uint64 + int would be less
surprising?

Kind of makes sense but it still doesn't feel consistent that uint64
"increases" the size of the variable even when there's no need but int64
doesn't even when there is. Especially as uint64's current behavior can
lead to subtle bugs that might be difficult to find: #5746
#5746 If I recall correctly signed
integer overflow is undefined in C but unsigned integer overflow is defined
and widely used which makes numpy's behavior even more confusing.

—
Reply to this email directly or view it on GitHub
#5745 (comment).

153957 · 2015-07-21T13:10:45Z

We had a similar problem in our code, which we work around this by converting the value to a Python int, adding the extra number and converting back to uint64.

ts = np.uint64(1400000002000000600)
print ts + 1
# 1.4000000020000005e+18
print np.uint64(int(ts) + 1)
# 1400000002000000601

zed · 2016-12-23T23:58:47Z

@njsmith :

None of numpy's conversion rules look at the actual values, only at their
type.

Here's an example of the NumPy type promotion rules:

>>> np.result_type(np.int32(0), 0) 
dtype('int64') # np.int_
>>> np.result_type(np.int32(0), 1<<62)
dtype('int64')
>>> np.result_type(np.int32(0), 1<<63)
dtype('float64')
>>> np.result_type(np.int32(0), 1<<64)
dtype('O') # Python object

The input types are the same in all these cases (np.int32, int) but the result types are different.

shoyer · 2016-12-24T00:22:53Z

The input types are the same in all these cases (np.int32, int) but the result types are different.

@njsmith's statement is mostly but not entirely correct: scalar types are treated specially. But it certainly is true that for arrays (excluding 0-dimensional arrays), NumPy does not look at actual values.

eric-wieser · 2017-04-25T19:06:55Z

See #7126 for another extended discussion on this

seibert · 2019-08-19T19:41:07Z

Fast forward a few years from the last comment on this issue, and I'm curious if there is now any interest from the NumPy core developers in considering a change to the overall type casting behavior of uint64 + int64 => float64.

Fundamentally, there are two differently problematic approaches to deal with typing a fixed maximum integer size (assuming you don't want to deal with value-dependent typing):

Accept potential overflow when adding uint64 to int64 by making the result int64.
Accept potential loss of precision when adding uint64 to int64 by making the result float64.

Option 2 is what NumPy does (and implicitly Javascript, which doesn't distinguish between ints and floats with its Number type) and Option 1 is what C, C++ and many other languages do. (It is unfortunate that signed overflow is undefined in C, but it is well-defined in many other languages, like Java, Rust and Julia.)

Beyond Option 1 being more intuitive for anyone familiar with integer arithmetic in other languages, I think the fact that NumPy no longer accepts floats as array indices (with good reason!) makes Option 2 even more problematic as users can accidentally compute a float index when combining two integer indices. This problem also pops up in Numba (where we attempt to mimic NumPy casting rules), making it unwise for us to use unsigned ints for things like ranges because they will accidentally create floats when combined with other integer values and cause compilation errors. Other efforts to compile user code written against the NumPy API will also face this question.

Additionally, the current approach seems kind of ad-hoc and inconsistent, given that the far more common int64 + int64 has a similar overflow issue, yet this case is not cast into a float64.

Given how long this behavior has been in place, it may be impractical to change, but if it were in principle possible, would people be in favor of uint64 + int64 => int64?

seberg · 2019-08-19T19:57:37Z

@seibert you may be interested in this: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76

But the problem is quite clear, the way numpy does this type of promotion is fraught with issues (the other thing is value based promotion). But these things have been around since 1.7 so probably for around 8 years, and it is a bit hard to say if a change is reasonably possible or not (say we do increase the major version for this). (EDIT: Part of it was probably around since the beginning of time.)

So yes, given the choice, I think we would probably go with option 1, or at least something more like it...

seibert · 2019-08-20T13:13:11Z

That is an excellent proposal! I would like to see it become a NEP to see if we can get some consensus around it.

bergkvist · 2020-05-26T00:51:41Z

Might be a bit of a radical suggestion, but what about dropping support for uint64 alltogether? Who needs whole numbers between 2**63 and 2**64 anyways?

This would mean that accumulative type inflation of integers would converge to int64 instead of float64. Information would not be lost silently (nor would there be need for an error)

colinww · 2020-06-01T12:39:56Z

I realize I am probably a corner case, but this bug makes interfacing to C++ code via Cython a bit precarious. I use a lot of std::size_t variables to define array dimensions, indicies, etc. In general I think the interface that numpy exposes makes a lot of sense, so this is a departure.

Now, everytime I want to use values pulled from C++ headers (which convert std::size_t -> np.uint64, on my machine) I have to remember to add int(), say to populate a vector:

bin_wts = np.ones(par.dspng.fcafe.tx.dem.NSEG)
for ii in range(int(par.dspng.fcafe.tx.dem.NSEG) - 1):
    bin_wts[ii + 1] = 2 * bin_wts[ii]

The call np.ones(par.dspng.fcafe.tx.dem.NSEG) works fine, because par.dspng.fcafe.tx.dem.NSEG is still np.uint64. But par.dspng.fcafe.tx.dem.NSEG - 1 silently casts to np.float64, so I first cast it to a Python int.

seberg · 2020-06-01T13:41:36Z

@colinww isn't that there the behaviour of cython and not numpy? I am a bit surprised cython uses numpy casting rules here, although maybe they wanted to stay intentionally close.

colinww · 2020-06-01T16:11:22Z

@seberg sorry for not being clear. In cython, this is done "correctly" in that the intermediate generated cpp code (from the pyx source) does not do any sort of implicit casting. That same code, in pure Python, has this behavior we're discussing.

The reason I mention it's jarring is that the numpy interface generally gives the sense that is is underpinned by C. For example there are structured arrays, you can access pointers to the raw data structures, there are tables like this: https://numpy.org/devdocs/user/basics.types.html for type conversions, etc.

This is because NumPy contains a booby trap where if you multiply an unsigned integer with a plain Python 'int', you will get back a float: numpy/numpy#5745

paigeweber13 · 2022-01-27T17:48:41Z

Just discovered this issue today.... This issue is relevant for me because I work with large integers on embedded systems, and this type coercion caused a valid uint64 to be rounded outside the range for uint64 because it lost precision when it was coerced into a float64. This caused an overflow when I later tried to store it in a C uint64_t

seberg · 2022-05-12T10:11:24Z

Closing in favor of the above linked gh-20905. It is a common issue unfortunately, hopefully we can fix it. But having the old links of all the closed duplicates go back to the open issue should be enough to give weight to it :).

alexpyattaev · 2023-11-15T15:03:29Z

This will cost someone a massive amount of money one day. Effectively this makes [u]int64 useless in numpy. What is even the point of having types at all? Lets make everything a double-int-a-like a-la JavaScript and call it a day. 1 + "1" = "11".

Also convert to python int from uint64 to keep things as int: numpy/numpy#5745

nasailja mentioned this issue Apr 4, 2015

Cannot reliably compare uint64 values to integers? #5746

Closed

mcara mentioned this issue Mar 24, 2017

shift operator does not work with numpy arrays with dtype=uint64 #5668

Closed

This was referenced Apr 25, 2017

Numpy uint64 added to Python int yields float #8986

Closed

np.uint64 + int == ... np.float64 ?? #7126

Closed

eric-wieser added 00 - Bug component: numpy._core labels Apr 25, 2017

shoyer mentioned this issue Oct 7, 2017

uint64 and int in integer-division returns float #9833

Closed

hanzhi713 mentioned this issue Dec 2, 2017

Unexpected integer division/modulus result when using np.uint64 #10148

Closed

godaygo mentioned this issue Feb 18, 2018

DOC: Improve the description of the dtype parameter in numpy.array docstring #10614

Closed

h-vetinari mentioned this issue Jan 8, 2019

BUG: incorrect equality of np.int64 and np.uint64 #12525

Closed

hmaarrfk mentioned this issue Feb 28, 2019

uint64 converted to float when dividing by python integer #13057

Closed

stuartarchibald mentioned this issue Aug 20, 2019

Try forcing range(x) to be an unsigned range. numba/numba#4446

Closed

underchemist mentioned this issue Oct 16, 2019

Numpy casting of list of large python ints to np.float64 in cocoEval.evaluate cocodataset/cocoapi#330

Open

seberg mentioned this issue May 18, 2020

Default dtype for integers across the int64 upper limit is float (due to uint64, int66 -> float64) #16287

Closed

jorisvandenbossche mentioned this issue May 25, 2020

BUG: int64 and uint64 values converted to float64 when concatenated pandas-dev/pandas#34356

Closed

3 tasks

seberg mentioned this issue Jun 15, 2020

numpy.clip does not mantain array dtype when setting min and max to int64 limits #16609

Closed

seberg mentioned this issue Mar 8, 2021

numpy converts to float if given a list of numpy.uint64 mixed with python ints #18557

Closed

athas added a commit to diku-dk/futhark that referenced this issue Mar 26, 2021

Do these computations with signed integers.

9fef2f3

This is because NumPy contains a booby trap where if you multiply an unsigned integer with a plain Python 'int', you will get back a float: numpy/numpy#5745

philass pushed a commit to diku-dk/futhark that referenced this issue Apr 10, 2021

Do these computations with signed integers.

6a670ce

This is because NumPy contains a booby trap where if you multiply an unsigned integer with a plain Python 'int', you will get back a float: numpy/numpy#5745

JiahaoYao mentioned this issue Oct 15, 2021

32-site basis fails QuSpin/QuSpin#473

Closed

paigeweber13 mentioned this issue Jan 27, 2022

BUG: Binary operations between uint64 and intX results in float64 #20905

Open

rgommers mentioned this issue Feb 22, 2022

Initialization of coo matrix with non-float arguments yields float index error scipy/scipy#8823

Closed

seberg closed this as completed May 12, 2022

maxnoe mentioned this issue Aug 9, 2022

Precision of astropy time cta-observatory/ctapipe_io_nectarcam#24

Closed

ghuls mentioned this issue Sep 29, 2022

Big UInt64 cannot be used in Polars expressions pola-rs/polars#5031

Closed

2 tasks

ptiza mentioned this issue Oct 18, 2023

Arithmetic between signed integers and UInt64 results in a float type pola-rs/polars#11837

Closed

2 tasks

ipdemes mentioned this issue Nov 15, 2023

adding a workaround for NumPy's conversion from unit64 to float nv-legate/legate#885

Merged

jakevdp mentioned this issue Dec 13, 2023

BUG: Float Output Type When Combining uint64 and int64 Arrays in intersect1d #25386

Closed

iljah added a commit to iljah/pamhd that referenced this issue Apr 26, 2024

Add cell center function to common python code.

9d3cca3

Also convert to python int from uint64 to keep things as int: numpy/numpy#5745

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uint64 converted silently to float64 when adding an int #5745

uint64 converted silently to float64 when adding an int #5745

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

njsmith commented Apr 4, 2015

153957 commented Jul 21, 2015

zed commented Dec 23, 2016

shoyer commented Dec 24, 2016

eric-wieser commented Apr 25, 2017

seibert commented Aug 19, 2019 •

edited

Loading

seberg commented Aug 19, 2019 •

edited

Loading

seibert commented Aug 20, 2019

bergkvist commented May 26, 2020

colinww commented Jun 1, 2020

seberg commented Jun 1, 2020

colinww commented Jun 1, 2020

paigeweber13 commented Jan 27, 2022

seberg commented May 12, 2022

alexpyattaev commented Nov 15, 2023

uint64 converted silently to float64 when adding an int #5745

uint64 converted silently to float64 when adding an int #5745

Comments

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

nasailja commented Apr 4, 2015

charris commented Apr 4, 2015

njsmith commented Apr 4, 2015

153957 commented Jul 21, 2015

zed commented Dec 23, 2016

shoyer commented Dec 24, 2016

eric-wieser commented Apr 25, 2017

seibert commented Aug 19, 2019 • edited Loading

seberg commented Aug 19, 2019 • edited Loading

seibert commented Aug 20, 2019

bergkvist commented May 26, 2020

colinww commented Jun 1, 2020

seberg commented Jun 1, 2020

colinww commented Jun 1, 2020

paigeweber13 commented Jan 27, 2022

seberg commented May 12, 2022

alexpyattaev commented Nov 15, 2023

seibert commented Aug 19, 2019 •

edited

Loading

seberg commented Aug 19, 2019 •

edited

Loading