Skip to content

BUG: Binary operations between uint64 and intX results in float64 #20905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
paigeweber13 opened this issue Jan 26, 2022 · 15 comments
Open

BUG: Binary operations between uint64 and intX results in float64 #20905

paigeweber13 opened this issue Jan 26, 2022 · 15 comments
Labels

Comments

@paigeweber13
Copy link

paigeweber13 commented Jan 26, 2022

Describe the issue:

Expected Behavior:

I expect to create a python Fraction from numpy integer types with no loss of precision

Actual Behavior:

Types are coerced to numpy.float64 and precision is lost

Reproducible code example:

import numpy as np
from fractions import Fraction

nums = np.array([1])
denoms = np.array([18446744073709550592], dtype='uint64')
f = Fraction(nums[0], denoms[0]) # denom has been rounded, is now a float
print(f) # 1.0/1.8446744073709552e+19
print(type(f.denominator)) # <class 'numpy.float64'>

Error message:

No response

NumPy/Python version information:

Numpy: 1.21.2
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
OS: Debian GNU/Linux 10 (buster)

Other information that may be helpful:

I discovered this behavior while creating fractions from parallel arrays representing the numerators and denominators of fractions. This loss of precision is problematic because the float is rounded up so that it is out of range for uint64, despite the original value being valid. In my case I have a workaround, as converting the values to python int before inputting them to the Fraction constructor prevents this issue. However, I wanted to let you know because it is unexpected that an operation with two numpy integer types produced a value with floating point types.

I briefly looked into the Fractions constructor and I discovered that this behavior comes from lines 175-180 of fractions.py:

        elif (isinstance(numerator, numbers.Rational) and
            isinstance(denominator, numbers.Rational)):
            numerator, denominator = (
                numerator.numerator * denominator.denominator,
                denominator.numerator * numerator.denominator
                )

numpy.uint64 is an instance of numbers.Rational:

>>> import numpy as np
>>> import numbers
>>> denoms = np.array([18446744073709550592], dtype='uint64')
>>> isinstance(denoms[0], numbers.Rational)
True

Inspecting the reproducible snippet above with pdb reveals that the multiplication on lines 178-179 of fractions.py is the moment when the type is changed.

Thank you for your time (:

@soma2000-lang
Copy link
Contributor

I am working on this @paigeweber13

@paigeweber13
Copy link
Author

@soma2000-lang I just discovered bug report #5745, which is almost identical. The only difference is that in my case the operator is * and in that case the operator is +

@seberg
Copy link
Member

seberg commented Jan 27, 2022

Sounds like it needs a decent proposal for what to do with uint64 + int64 behaviour and how to pull of that transition. To be clear, I would love a simple and easy proposal, but until we have that there is nothing to be done really. However, unless you want to think about that (which is great!), I am not sure there is anything straight forward to do here @soma2000-lang.

We could even consider just adding a warning when this type of promotion happens (that should be very simple)... But, I personally don't have a very clear gut feeling on what we should do, and I have currently more important things to lose sanity over :).

@paigeweber13
Copy link
Author

I would deeply appreciate a warning of some kind, as well as some clear documentation on when type coercions happen. I spent a full workday chasing down this behavior.

I'd be willing to write up an MR for improved documentation, if you can point me to the code where type coercions happen.

Personally I'd like int64 and uint64 operations to be coerced to int64. That would be my proposal. I'd rather have overflow than a loss of precision when I specifically chose an integer type for its discrete nature. Ideally (and this is a stretch, I don't expect it to happen any time soon) I think implementing a variable-width numpy integer type would be really handy.

@seberg
Copy link
Member

seberg commented Jan 27, 2022

This is the place that it is defined:

static PyArray_DTypeMeta *
default_builtin_common_dtype(PyArray_DTypeMeta *cls, PyArray_DTypeMeta *other)
{
assert(cls->type_num < NPY_NTYPES);
if (!NPY_DT_is_legacy(other) || other->type_num > cls->type_num) {
/*
* Let the more generic (larger type number) DType handle this
* (note that half is after all others, which works out here.)
*/
Py_INCREF(Py_NotImplemented);
return (PyArray_DTypeMeta *)Py_NotImplemented;
}
/*
* Note: The use of the promotion table should probably be revised at
* some point. It may be most useful to remove it entirely and then
* consider adding a fast path/cache `PyArray_CommonDType()` itself.
*/
int common_num = _npy_type_promotion_table[cls->type_num][other->type_num];
if (common_num < 0) {
Py_INCREF(Py_NotImplemented);
return (PyArray_DTypeMeta *)Py_NotImplemented;
}
return PyArray_DTypeFromTypeNum(common_num);
}

Which just looks it up in another place that is a funny generated table though. In any case, that is the place where you would have to change it (by introducing a new function probably, and using that for int64 or uint64 – which ever has the "larger typenumber", just because that is how it is implemented currently). – Well, that is the path forward to add a warning at least. Plus, may need to check that nothing else directly reads that table. Nothing should, but I am not sure.

@paigeweber13
Copy link
Author

If I have time this week I'll take a look. I'm going to start with documentation so other people that experience this issue at least have a reference for what is expected

@seberg seberg changed the title BUG: Creating a Fraction from numpy integer types coerces types to numpy.float64 BUG: Binary operations between uint64 and intX scalars results in float64 Feb 1, 2022
@jmd-dk
Copy link

jmd-dk commented Feb 1, 2022

Sounds like needs a decent proposal for what to do with uint64 + int64 behaviour and how to pull of that transition.

Option 1) from seibert in #5745 seems like the obvious thing to do. Follow C/C++ and let uint64 and int64 result in int64, with the danger of overflow being present.

However, the lower-bit scalars all promote to the next higher signed integer type, e.g. uint32 and int32 combine to int64, while uint16 and int16 combine to int32.

  • While safe from a value perspective, this feels somewhat odd when specifically working with a "fixed size" data type.
  • To continue this trend, uint64 and int64 should combine to int128, which does not exist. Even if it did, this would just move the problem up one level, not remove it. Alternatively, to end this infinite regress, the rule could be that uint64 and int64 combine to the standard Python (3) int, which has arbitrary precision. To keep things "within" NumPy, this would properly require a new dtype, someting like a new np.pyint, which more or less is just an alias for the standard Python int.

@charris
Copy link
Member

charris commented Feb 2, 2022

The Python int looks like a reasonable fix. It would also fit in with the proposal for typed object arrays, that is, arrays that only contain one type of object.

@joni-herttuainen
Copy link

joni-herttuainen commented Feb 9, 2022

Also found out about this recently. My two cents:

I would object converting to int being inherently better than converting to float.
"Intuitively", uint64 + positive value should never result in a negative value.
That being said, nothing about uint + int = float is intuitive, either.

I would say, in the case of uint64 + intX:

  • warning would definitely be better than quietly coercing to float
    • obviously the same goes for the case of coercing the result to int64 or uint64
  • even downright refusing to go forward and raising an exception would be better than silent unexpected behaviour
    • I am not saying this should be the way to go, though

@seberg seberg changed the title BUG: Binary operations between uint64 and intX scalars results in float64 BUG: Binary operations between uint64 and intX results in float64 May 12, 2022
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 19, 2023
This is not required when compiled with numba
see numpy/numpy#20905
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 24, 2023
This is not required when compiled with numba
see numpy/numpy#20905
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 26, 2023
This is not required when compiled with numba
see numpy/numpy#20905
@burlen
Copy link
Contributor

burlen commented Mar 31, 2023

#23476 is a variation on the theme. +1 to for the proposal mentioned above to follow the C/C++ behavior.

@alexpyattaev
Copy link

Until I found this issue I was promoting numpy as a strictly typed alternative to the nonsense of matlab. Please stop proving me wrong.

@rkern
Copy link
Member

rkern commented Nov 15, 2023

We've never advertised ourselves as strictly-typed, whatever that means to you. We have types with promotion rules, most of which are fairly reasonable, but some dark corners where there aren't many obviously great alternatives (and it's not clear to me that raising an exception to be "strictly-typed" here is a better alternative, though one could certainly argue the point).

@alexpyattaev
Copy link

Well point is that there are certain rules about type conversion in Python. Int should not magically turn into float when added to another int, for example. Quietly coercing uint64+uint64 to float64 is not just inappropriate from the point of view of possible overflow/loss of precision, but is also the kind of loose type magic that terrible commercial tools like matlab really like. However, in this particular case numpy is clearly behind:

>>> a=np.array([2**64-1],dtype=np.uint64)
>>> int(a[0]-1) == int((a-1)[0])
False # I would expect a True here

For comparison, the abominable Matlab gets this one correct, and preserves the type of the resulting array as expected

>> uint64(42) +1
ans =
  uint64
   43
>> a=[uint64(2)^63 ];
>> b= a-1
b =
  uint64
   9223372036854775807
>> b(1) == (a(1)-1)
ans =
  logical
   1

PS: Matlab has other, more serious problems, of course:

>> uint64(5) -uint64(7)
ans =
  uint64
   0

@jakevdp
Copy link
Contributor

jakevdp commented Nov 16, 2023

@alexpyattaev Just a note, I think this particular issue (adding 1 to a uint64 causing promotion) would be addressed by the "weak promotion" semantics of NEP 50; see https://numpy.org/neps/nep-0050-scalar-promotion.html#proposed-weak-promotion.

I understand your frustration here: uint64/int64 promotion is an unfortunate corner case in NumPy's documented type promotion semantics, but it's something that has been baked-in for a quarter century now. Making that kind of backward-incompatible change is a rather arduous process, for good reason: stability in tools like NumPy is hugely important to the various communities that depend on them.

But the community of numpy developers is thinking about these things and how they can be improved.

@alexpyattaev
Copy link

Maybe it would be nice to have this issue a warning? This way current users can know about possible bugs (I doubt anyone is relying on this sort of promotion in production). Making this an error is not even necessary as long as there is a way to actually run the math operations in uint64 that is reliable. This would keep backwards compatibility while enabling this to get fixed eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants