Skip to content

uint64 converted silently to float64 when adding an int #5745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nasailja opened this issue Apr 4, 2015 · 23 comments
Closed

uint64 converted silently to float64 when adding an int #5745

nasailja opened this issue Apr 4, 2015 · 23 comments

Comments

@nasailja
Copy link

nasailja commented Apr 4, 2015

This code

import numpy
a = numpy.zeros(1, dtype = numpy.uint64)[0]
print(type(a))
i = 1
print(type(i))
a += i
print(type(a))

prints

<class 'numpy.uint64'>
<class 'int'>
<class 'numpy.float64'>

which was a big surprise for me. Why would adding an integer to uint64 result in a floating point value?

@charris
Copy link
Member

charris commented Apr 4, 2015

Because integer is signed and uint64 is not, so the common type needs more precision than uint64. Using float64 is admittedly a compromise since some precision is still lost. The += operator here is also between two (immutable) scalars, so it is not actually done inplace, rather, Python adds the two numbers and assigns the result to a.

@nasailja
Copy link
Author

nasailja commented Apr 4, 2015

Hmm 0x123...def is only about 2**61 which seems to fit into a regular int:

>>> 0x1234567890abcdef
1311768467294899695
>>> 2**60
1152921504606846976
>>> 2**61
2305843009213693952
>>> 2**62
4611686018427387904
>>> 2**63
9223372036854775808L

so 64 bits should be enough whether the number is signed or not. At least for me a regular integer is able to represent all positive values that are supposed to fit into it (on x86_64):

>>> 0x7fffffffffffffff
9223372036854775807
>>> 0x8000000000000000
9223372036854775808L

This behavior also seems odd:

>>> numpy.uint64(0x7ffffffffffffffe) - 2**63
-2L
>>> numpy.uint64(0x7ffffffffffffffe) - 2**62
4.6116860184273879e+18
>>> numpy.uint64(0x7ffffffffffffffe) - 2.0**63
0.0

I guess 0x7f...f rounds to the same value as 2**63 when converted to floating point, so the case of adding to a standard floating point value makes sense but what is happening with 2**62 and how can that be considered not a bug?

@nasailja
Copy link
Author

nasailja commented Apr 4, 2015

Even better demonstration that numpy is doing something strange:

>>> 2**62+(2**62-1)
9223372036854775807
>>> numpy.uint64(2**62)+(2**62-1)
9.2233720368547758e+18

@nasailja
Copy link
Author

nasailja commented Apr 4, 2015

Looks like a bug in uint64 as int64 works correctly:

>>> numpy.int64(2**62)+(2**62-1)
9223372036854775807
>>> numpy.uint64(2**62)+(2**62-1)
9.2233720368547758e+18

@nasailja
Copy link
Author

nasailja commented Apr 4, 2015

With respect to converting to floating point, I'd have preferred int64's behavior:

>>> numpy.int64(2**62)+(2**62)
__main__:1: RuntimeWarning: overflow encountered in long_scalars
-9223372036854775808
>>> numpy.uint64(2**62)+(2**62)
9.2233720368547758e+18

even though it shouldn't be needed for uint64 above. It seems to always want to convert to floating point:

>>> numpy.uint64(1) + 1
2.0

I'm using Python 3.4.3 (default, Mar 10 2015, 14:38:54) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin and numpy.version.version '1.9.2' from macports.

@charris
Copy link
Member

charris commented Apr 4, 2015

The difference is that in the first example both numbers are signed. For mixed signed and unsigned types the result is the smallest signed type that can hold both. For example

In [1]: type(uint32(1) + 1)
Out[1]: numpy.int64

If you need to have uint64 type result, do

In [5]: type(uint64(2**62)+ uint64((2**62-1)))
Out[5]: numpy.uint64

One can argue that numpy should check the actual values, but then the result becomes less predictable.

@nasailja
Copy link
Author

nasailja commented Apr 4, 2015

Kind of makes sense but it still doesn't feel consistent that uint64 "increases" the size of the variable even when there's no need but int64 doesn't even when there is. Especially as uint64's current behavior can lead to subtle bugs that might be difficult to find: #5746 If I recall correctly signed integer overflow is undefined in C but unsigned integer overflow is defined and widely used which makes numpy's behavior even more confusing.

@charris
Copy link
Member

charris commented Apr 4, 2015

Which is not to say I don't agree with you that a float result is weird, I was never happy with that myself. ISTR that at some point the result of adding arrays of uint64 and int64 was of object type, but I may be mistaken there.

@njsmith
Copy link
Member

njsmith commented Apr 4, 2015

None of numpy's conversion rules look at the actual values, only at their
type. This can be confusing in these kinds of micro tests but is an
important property to make sure that code works consistently and that what
you get when trying stuff at the prompt like this matches what you'll get
with larger days later where you aren't checking every value.

Silently casting uint64 to float64 is pretty weird though -- it breaks all
the usual rules for which casts are allowed. (It changes kinds and loses
precision, so by the rules we use everywhere else it's actually worse
than e.g. silently casting float64 to float32.) Bleh.

I guess we could stretch the scalar kind rule to say that positive python
ints are ambiguous between unsigned and signed, sort of meld two weird
special cases into one? ints are already very weird since in py3 they're
now always infinite precision, but we still treat them as int32 or int64
depending on platform. This wouldn't fix all the weirdness -- int64 +
uint64 would still be weird -- but at least uint64 + int would be less
surprising?

Kind of makes sense but it still doesn't feel consistent that uint64
"increases" the size of the variable even when there's no need but int64
doesn't even when there is. Especially as uint64's current behavior can
lead to subtle bugs that might be difficult to find: #5746
#5746 If I recall correctly signed
integer overflow is undefined in C but unsigned integer overflow is defined
and widely used which makes numpy's behavior even more confusing.


Reply to this email directly or view it on GitHub
#5745 (comment).

@153957
Copy link
Contributor

153957 commented Jul 21, 2015

We had a similar problem in our code, which we work around this by converting the value to a Python int, adding the extra number and converting back to uint64.

ts = np.uint64(1400000002000000600)
print ts + 1
# 1.4000000020000005e+18
print np.uint64(int(ts) + 1)
# 1400000002000000601

@zed
Copy link

zed commented Dec 23, 2016

@njsmith :

None of numpy's conversion rules look at the actual values, only at their
type.

Here's an example of the NumPy type promotion rules:

>>> np.result_type(np.int32(0), 0) 
dtype('int64') # np.int_
>>> np.result_type(np.int32(0), 1<<62)
dtype('int64')
>>> np.result_type(np.int32(0), 1<<63)
dtype('float64')
>>> np.result_type(np.int32(0), 1<<64)
dtype('O') # Python object

The input types are the same in all these cases (np.int32, int) but the result types are different.

@shoyer
Copy link
Member

shoyer commented Dec 24, 2016

The input types are the same in all these cases (np.int32, int) but the result types are different.

@njsmith's statement is mostly but not entirely correct: scalar types are treated specially. But it certainly is true that for arrays (excluding 0-dimensional arrays), NumPy does not look at actual values.

@eric-wieser
Copy link
Member

See #7126 for another extended discussion on this

@seibert
Copy link
Contributor

seibert commented Aug 19, 2019

Fast forward a few years from the last comment on this issue, and I'm curious if there is now any interest from the NumPy core developers in considering a change to the overall type casting behavior of uint64 + int64 => float64.

Fundamentally, there are two differently problematic approaches to deal with typing a fixed maximum integer size (assuming you don't want to deal with value-dependent typing):

  1. Accept potential overflow when adding uint64 to int64 by making the result int64.
  2. Accept potential loss of precision when adding uint64 to int64 by making the result float64.

Option 2 is what NumPy does (and implicitly Javascript, which doesn't distinguish between ints and floats with its Number type) and Option 1 is what C, C++ and many other languages do. (It is unfortunate that signed overflow is undefined in C, but it is well-defined in many other languages, like Java, Rust and Julia.)

Beyond Option 1 being more intuitive for anyone familiar with integer arithmetic in other languages, I think the fact that NumPy no longer accepts floats as array indices (with good reason!) makes Option 2 even more problematic as users can accidentally compute a float index when combining two integer indices. This problem also pops up in Numba (where we attempt to mimic NumPy casting rules), making it unwise for us to use unsigned ints for things like ranges because they will accidentally create floats when combined with other integer values and cause compilation errors. Other efforts to compile user code written against the NumPy API will also face this question.

Additionally, the current approach seems kind of ad-hoc and inconsistent, given that the far more common int64 + int64 has a similar overflow issue, yet this case is not cast into a float64.

Given how long this behavior has been in place, it may be impractical to change, but if it were in principle possible, would people be in favor of uint64 + int64 => int64?

@seberg
Copy link
Member

seberg commented Aug 19, 2019

@seibert you may be interested in this: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76

But the problem is quite clear, the way numpy does this type of promotion is fraught with issues (the other thing is value based promotion). But these things have been around since 1.7 so probably for around 8 years, and it is a bit hard to say if a change is reasonably possible or not (say we do increase the major version for this). (EDIT: Part of it was probably around since the beginning of time.)

So yes, given the choice, I think we would probably go with option 1, or at least something more like it...

@seibert
Copy link
Contributor

seibert commented Aug 20, 2019

That is an excellent proposal! I would like to see it become a NEP to see if we can get some consensus around it.

@bergkvist
Copy link

Might be a bit of a radical suggestion, but what about dropping support for uint64 alltogether? Who needs whole numbers between 2**63 and 2**64 anyways?

This would mean that accumulative type inflation of integers would converge to int64 instead of float64. Information would not be lost silently (nor would there be need for an error)

@colinww
Copy link

colinww commented Jun 1, 2020

I realize I am probably a corner case, but this bug makes interfacing to C++ code via Cython a bit precarious. I use a lot of std::size_t variables to define array dimensions, indicies, etc. In general I think the interface that numpy exposes makes a lot of sense, so this is a departure.

Now, everytime I want to use values pulled from C++ headers (which convert std::size_t -> np.uint64, on my machine) I have to remember to add int(), say to populate a vector:

bin_wts = np.ones(par.dspng.fcafe.tx.dem.NSEG)
for ii in range(int(par.dspng.fcafe.tx.dem.NSEG) - 1):
    bin_wts[ii + 1] = 2 * bin_wts[ii]

The call np.ones(par.dspng.fcafe.tx.dem.NSEG) works fine, because par.dspng.fcafe.tx.dem.NSEG is still np.uint64. But par.dspng.fcafe.tx.dem.NSEG - 1 silently casts to np.float64, so I first cast it to a Python int.

@seberg
Copy link
Member

seberg commented Jun 1, 2020

@colinww isn't that there the behaviour of cython and not numpy? I am a bit surprised cython uses numpy casting rules here, although maybe they wanted to stay intentionally close.

@colinww
Copy link

colinww commented Jun 1, 2020

@seberg sorry for not being clear. In cython, this is done "correctly" in that the intermediate generated cpp code (from the pyx source) does not do any sort of implicit casting. That same code, in pure Python, has this behavior we're discussing.

The reason I mention it's jarring is that the numpy interface generally gives the sense that is is underpinned by C. For example there are structured arrays, you can access pointers to the raw data structures, there are tables like this: https://numpy.org/devdocs/user/basics.types.html for type conversions, etc.

athas added a commit to diku-dk/futhark that referenced this issue Mar 26, 2021
This is because NumPy contains a booby trap where if you multiply an
unsigned integer with a plain Python 'int', you will get back a float:
numpy/numpy#5745
philass pushed a commit to diku-dk/futhark that referenced this issue Apr 10, 2021
This is because NumPy contains a booby trap where if you multiply an
unsigned integer with a plain Python 'int', you will get back a float:
numpy/numpy#5745
@paigeweber13
Copy link

Just discovered this issue today.... This issue is relevant for me because I work with large integers on embedded systems, and this type coercion caused a valid uint64 to be rounded outside the range for uint64 because it lost precision when it was coerced into a float64. This caused an overflow when I later tried to store it in a C uint64_t

@seberg
Copy link
Member

seberg commented May 12, 2022

Closing in favor of the above linked gh-20905. It is a common issue unfortunately, hopefully we can fix it. But having the old links of all the closed duplicates go back to the open issue should be enough to give weight to it :).

@alexpyattaev
Copy link

This will cost someone a massive amount of money one day. Effectively this makes [u]int64 useless in numpy. What is even the point of having types at all? Lets make everything a double-int-a-like a-la JavaScript and call it a day. 1 + "1" = "11".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests