-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: dtype comparison coerces generously leading to hash problems #7242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Even worse, dtypes compare equal to strings!
This a long standing API issue in NumPy. If we were starting over, dtypes definitely should not compare equal to types/strings, but it's too late to change that without a major compatibility break. |
Out of curiosity, where do people use this functionality? |
All over the place, in my experience. I don't have any particular examples handy, though :) |
@shoyer Okay, thanks :) |
This would matter any time you want to use both numpy dtypes and numpy types as keys in a dictionary. I've been wanting to do something like this to map between numpy types and dynd types. Fortunately, in most cases where you would want to build a mapping from numpy types to some other set of types, numpy's types and dtypes will have the same output value, which ought to hide the bug. It'd be great to get this fixed though. |
Unfortunately, the only way we could possibly fix this bug is to first deprecate auto-coercion-to-dtype in |
@njsmith Could we start with the deprecation? At least in the docs? |
Deprecations start on the mailing list, so you'd have to bring it up there. There is a ton of code that does |
@njsmith I understand. Thanks for explaining. It might be a good idea to at least have easy ways to write that code that does not rely on the weird |
In all of these cases the obvious way to fix the implicit coercion would be to make it explicit: |
@njsmith Ah, of course. Right, so that's probably the first step. I don't understand why is |
Early on, |
The upshot is: don't use |
@rkern Wow, yeah that's really unfortunate. |
@rkern Also I don't think that Python promises that |
Not in so many words, but numpy will map it to whichever dtype is appropriate (i.e. virtually always a |
@shoyer did you reopen this intentionally? |
Yes, I think so? I imagine I was contemplating this as something to consider for an eventual API cleanup. |
Would it be possible to repair this comparison in And perhaps also edit the Array API standard to make this invariant guaranteed. (Added data-apis/array-api#582 to that effect.) |
@asmeurer for the Array API standard change |
I got a little lazy and decided not to wrap the dtype objects in numpy.array_api (except for making it so that Regarding the standard, see the cross-referenced issue right above this comment. |
Right, I understand the simplicity there, but unfortunately that means that the Array API inherited the unpythonic behaviour of dtypes in this issue.
I wouldn't say it's an issue for me. However, if you see this comment data-apis/array-api#582 (comment), it appears that the Array API will forbid that behaviour. If the language in the linked comment is accepted, the original dtype objects won't be compliant. (You would need new objects that don't compare equal to strings or the numpy types they contain.) Also, your implementation of numpy's array API is so elegant. I love how (Also, numpy's array API is very exciting!) |
This was also the cause of nondeterministic behavior in pandas |
Summary
Make DType objects, their corresponding types, and their string names all compare unequal.
Historical issue
For some reason,
dtype
objects and the numpy types they are based off of compare equal. They don't hash equal of course, which goes against Python's docs that sayCan we make them compare unequal? Is there any reason for them to compare equal?
The text was updated successfully, but these errors were encountered: