-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
uint64 converted silently to float64 when adding an int #5745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Because |
Hmm 0x123...def is only about 2**61 which seems to fit into a regular int:
so 64 bits should be enough whether the number is signed or not. At least for me a regular integer is able to represent all positive values that are supposed to fit into it (on x86_64):
This behavior also seems odd:
I guess 0x7f...f rounds to the same value as |
Even better demonstration that numpy is doing something strange:
|
Looks like a bug in uint64 as int64 works correctly:
|
With respect to converting to floating point, I'd have preferred int64's behavior:
even though it shouldn't be needed for uint64 above. It seems to always want to convert to floating point:
I'm using Python 3.4.3 (default, Mar 10 2015, 14:38:54) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin and numpy.version.version '1.9.2' from macports. |
The difference is that in the first example both numbers are signed. For mixed signed and unsigned types the result is the smallest signed type that can hold both. For example
If you need to have uint64 type result, do
One can argue that numpy should check the actual values, but then the result becomes less predictable. |
Kind of makes sense but it still doesn't feel consistent that uint64 "increases" the size of the variable even when there's no need but int64 doesn't even when there is. Especially as uint64's current behavior can lead to subtle bugs that might be difficult to find: #5746 If I recall correctly signed integer overflow is undefined in C but unsigned integer overflow is defined and widely used which makes numpy's behavior even more confusing. |
Which is not to say I don't agree with you that a float result is weird, I was never happy with that myself. ISTR that at some point the result of adding arrays of uint64 and int64 was of object type, but I may be mistaken there. |
None of numpy's conversion rules look at the actual values, only at their Silently casting uint64 to float64 is pretty weird though -- it breaks all I guess we could stretch the scalar kind rule to say that positive python Kind of makes sense but it still doesn't feel consistent that uint64 — |
We had a similar problem in our code, which we work around this by converting the value to a Python int, adding the extra number and converting back to uint64. ts = np.uint64(1400000002000000600)
print ts + 1
# 1.4000000020000005e+18
print np.uint64(int(ts) + 1)
# 1400000002000000601 |
@njsmith :
Here's an example of the NumPy type promotion rules:
The input types are the same in all these cases ( |
@njsmith's statement is mostly but not entirely correct: scalar types are treated specially. But it certainly is true that for arrays (excluding 0-dimensional arrays), NumPy does not look at actual values. |
See #7126 for another extended discussion on this |
Fast forward a few years from the last comment on this issue, and I'm curious if there is now any interest from the NumPy core developers in considering a change to the overall type casting behavior of Fundamentally, there are two differently problematic approaches to deal with typing a fixed maximum integer size (assuming you don't want to deal with value-dependent typing):
Option 2 is what NumPy does (and implicitly Javascript, which doesn't distinguish between ints and floats with its Number type) and Option 1 is what C, C++ and many other languages do. (It is unfortunate that signed overflow is undefined in C, but it is well-defined in many other languages, like Java, Rust and Julia.) Beyond Option 1 being more intuitive for anyone familiar with integer arithmetic in other languages, I think the fact that NumPy no longer accepts floats as array indices (with good reason!) makes Option 2 even more problematic as users can accidentally compute a float index when combining two integer indices. This problem also pops up in Numba (where we attempt to mimic NumPy casting rules), making it unwise for us to use unsigned ints for things like ranges because they will accidentally create floats when combined with other integer values and cause compilation errors. Other efforts to compile user code written against the NumPy API will also face this question. Additionally, the current approach seems kind of ad-hoc and inconsistent, given that the far more common Given how long this behavior has been in place, it may be impractical to change, but if it were in principle possible, would people be in favor of |
@seibert you may be interested in this: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 But the problem is quite clear, the way numpy does this type of promotion is fraught with issues (the other thing is value based promotion). But these things have been around since 1.7 so probably for around 8 years, and it is a bit hard to say if a change is reasonably possible or not (say we do increase the major version for this). (EDIT: Part of it was probably around since the beginning of time.) So yes, given the choice, I think we would probably go with option 1, or at least something more like it... |
That is an excellent proposal! I would like to see it become a NEP to see if we can get some consensus around it. |
Might be a bit of a radical suggestion, but what about dropping support for uint64 alltogether? Who needs whole numbers between This would mean that accumulative type inflation of integers would converge to int64 instead of float64. Information would not be lost silently (nor would there be need for an error) |
I realize I am probably a corner case, but this bug makes interfacing to C++ code via Cython a bit precarious. I use a lot of Now, everytime I want to use values pulled from C++ headers (which convert
The call |
@colinww isn't that there the behaviour of cython and not numpy? I am a bit surprised cython uses numpy casting rules here, although maybe they wanted to stay intentionally close. |
@seberg sorry for not being clear. In cython, this is done "correctly" in that the intermediate generated cpp code (from the pyx source) does not do any sort of implicit casting. That same code, in pure Python, has this behavior we're discussing. The reason I mention it's jarring is that the numpy interface generally gives the sense that is is underpinned by C. For example there are structured arrays, you can access pointers to the raw data structures, there are tables like this: https://numpy.org/devdocs/user/basics.types.html for type conversions, etc. |
This is because NumPy contains a booby trap where if you multiply an unsigned integer with a plain Python 'int', you will get back a float: numpy/numpy#5745
This is because NumPy contains a booby trap where if you multiply an unsigned integer with a plain Python 'int', you will get back a float: numpy/numpy#5745
Just discovered this issue today.... This issue is relevant for me because I work with large integers on embedded systems, and this type coercion caused a valid |
Closing in favor of the above linked gh-20905. It is a common issue unfortunately, hopefully we can fix it. But having the old links of all the closed duplicates go back to the open issue should be enough to give weight to it :). |
This will cost someone a massive amount of money one day. Effectively this makes [u]int64 useless in numpy. What is even the point of having types at all? Lets make everything a double-int-a-like a-la JavaScript and call it a day. 1 + "1" = "11". |
Also convert to python int from uint64 to keep things as int: numpy/numpy#5745
This code
prints
which was a big surprise for me. Why would adding an integer to uint64 result in a floating point value?
The text was updated successfully, but these errors were encountered: