Skip to content

BUG: incorrect equality of np.int64 and np.uint64 #8851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eric-wieser opened this issue Mar 27, 2017 · 13 comments
Closed

BUG: incorrect equality of np.int64 and np.uint64 #8851

eric-wieser opened this issue Mar 27, 2017 · 13 comments

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Mar 27, 2017

This doesn't look at all desirable to me:

>>> lbnd = np.int64(np.iinfo(np.int64).max); lbnd
9223372036854775807
>>> ubnd = np.uint64(np.iinfo(np.int64).max + 1)
9223372036854775808
>>> lbnd == ubnd
True
>>> np.equal(lbnd, ubnd, casting='safe')
True
>>> np.equal(lbnd, ubnd, casting='samekind')
True
>>> int(lbnd) == int(ubnd)  # well, one of these behaves as expected...
False

Frankly, this is PHP-like behaviour, and not something I want to see happening quietly in python

Clearly what's happening here is that np.equal is upcasting both arguments to float, since combining a uint64 and int64 requires a int128 for storage.

I don't think relational operators should be allowed to make this conversion, and should either error, or do more work to determine the correct result.

Perhaps in general, we should warn when a (uint64,int64)->float conversion occurs, and ask that it be written explicitly? At the very least, I don't think this operation is appropriately covered by casting="safe"

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 27, 2017

Is this really desirable?

>>> np.can_cast(np.int64, np.float64, casting='safe')
True

‘safe’ means only casts which can preserve values are allowed.

Clearly though, in our case the value was not preserved

@charris
Copy link
Member

charris commented Mar 27, 2017

Note that uint64 + int64 returns float64. That is an old decision/compromise due to the fact that no numpy integer type can contain both. I suppose that other options would be to return object arrays containing Python ints or at least raise a warning. Python ints convert as signed, which adds to the problem.

@mhvk
Copy link
Contributor

mhvk commented Mar 27, 2017

The compromise does seem particularly broken for comparison operators, since for those there is no need to upcast to get the correct answer. @eric-wieser - possibly here too one might want a separate uint, int inner loops?

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 27, 2017

possibly here too one might want a separate uint, int inner loops?]

Can we just do this everywhere? Obvious candidates:

I guess it's not clear whether int64 + uint64 should be signed or unsigned.

I'd be in favor of disabling safe casting from u?int64 -> float64 everywhere, and adding explicit loops in the few cases where we decide conversion to float is desirable

@eric-wieser
Copy link
Member Author

Similar: #5746

@gfyoung
Copy link
Contributor

gfyoung commented Apr 27, 2017

I'd like to have a look at how to patch comparisons for uint64 and int64 (this would be a cleaner solution to the bug I'm patching in #8846). Where is that code located?

@rkern
Copy link
Member

rkern commented Apr 27, 2017

Okay, it's going to be somewhat trickier than I thought. The code for creating ufunc objects in numpy is generated, and that logic is quite complex. You can start here:

https://github.com/numpy/numpy/blob/master/numpy/core/code_generators/generate_umath.py#L407-L442

You will see that all of the comparisons have a simple type_resolver function that you will likely need to replace. They also have a single TD() (a convenience function that makes TypeDescription objects) that encompasses all of the dtypes. You will need to add some more TypeDescriptions that deal with the mixed signedness of the integers. I think you can just add them after the TD() that's there; the order is important, but I think that appending the new mixed-sign functions at the end will do the job. You can look at the specifications for add and multiply, etc. to see how it deals with mixed-type binary operations for the datetime dtypes.

The functions that you actually implement are here:

https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/loops.c.src#L916-L932

As you can see, they are also generated through an ad-hoc templating system. I'm not sure if you'll be able to make much use of the templating system; I expect the code will look different for each of the cases. In any case, make sure the functions you add are also declared in loops.h.src.

The TypeDescriptions that you add will have to use FuncNameSuffix(...) to specify each function for obscure reasons. You can't just specify the name of the function, because that's interpreted as specifying a function that gets called with each scalar operand rather than specifying one of these loop functions. You can't use the FullTypeDescr sentinel value like the datetime TypeDescriptions do because that expects the function names to look like LONG_lL_?_equal() (that is, the nominal type name, the input typecodes, and the output typecode) because ? is not a valid character to use in a C function name. So use something like FuncNameSuffix('mixedsign') and name your functions LONG_equal_mixedsign, ULONG_equal_mixedsign, etc.

Or you can add a new branch in this bit of code here to deal with whatever new scheme you come up with: https://github.com/numpy/numpy/blob/master/numpy/core/code_generators/generate_umath.py#L919-L964

@eric-wieser
Copy link
Member Author

@gfyoung: Take a look at #8853 for an example of adding these loops. Everything @rkern said above should apply too. It might be worth adding some special handling to make ? map the the word bool so that FullTypeDescr works as expected

Of particular note - you need to handle both LONG op ULONG and LONGLONG op ULONGLONG, because both of these are sometimes 64 bits.

@umanwizard
Copy link

Is anyone still working on this?

>>> np.can_cast(np.int64, np.float64, casting="safe")
True

@dcolascione
Copy link

@umanwizard , I filed #13733 for that separately.

@umanwizard
Copy link

@dcolascione FWIW it's discussed in this present thread, in Eric's comment

@dcolascione
Copy link

Ah, okay. Thanks!

@seberg
Copy link
Member

seberg commented May 12, 2022

Closing in favor of gh-12525, but stealing the clearer title :).

@seberg seberg closed this as completed May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants