-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: ensure that casting to/from structured is properly checked. #13648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ensure that casting to/from structured is properly checked. #13648
Conversation
} | ||
else { | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could perhaps see an argument that in this direction, this should be any(PyArray_CanCastTypeTo(from, field, casting) for field in fields)
, allowing 2
to cast into (2, 2, 2,)
, which I think we have precedent for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was perhaps more driven by symmetry here than anything else. But I think you are right: casting to multiple fields is in principle not ambiguous. But it should be all(cancast(...))
, then, not any(...)
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To take a step back here, do we want safe casting? It feels right, but it also feels right that safe casting in most cases aligns with "common_type
" and I am not sure I feel that I feel things like np.concatenate
should feel they can just "broadcast" to all fields without some manual casting. (I did not make up my mind yet, probably its OK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's a question of if we already set a precedent for this weird behavior. If we didn't, then the original more conservative patch was probably the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving this out of the in-line comments, so it doesn't get lost when I change things.
e41981e
to
7ce1813
Compare
@eric-wieser - I changed it to allow casting to multiple fields. Also realized I had to put it further up, since the changes hold also for unsafe casting. Since I think this is getting close, I also updated the docstrings. I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a few tests assume that unsafe casting works in strange ways... I am not sure we want to go there and change that just now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a few tests assume that unsafe casting works in strange ways... I am not sure we want to go there and change that just now...
@seberg, @eric-wieser, thanks for all the comments! On the casting, I'm a bit torn. I agree with @eric-wieser that we should try to make
My own sense is that symmetry suggests that we do not allow simple to structured at all for any casting but "unsafe". But for "unsafe", it does seem fine to just allow the cast to all fields, especially as this is what happens also in This seems to make sufficient sense that I'll go ahead and change it accordingly... Current behaviourGoing from structured to simple:
So, for this case, Now turning to casting from simple to structured
So, the current behaviour is that casting to a structured type is never allowed for "equiv" and always allowed for "safe" and beyond. |
7ce1813
to
7f15d42
Compare
Rewritten as noted above. Quite a bit simpler PR, which suggests it may be better.. |
7f15d42
to
cd90983
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may get more complicating, fear... I am willing to move this forward though as an improvement though if you feel that is better. There is no regression here in 1.17, right?
/* | ||
* Fast path for basic types. | ||
*/ | ||
if ((NPY_LIKELY(from->type_num < NPY_OBJECT) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose <=
would work as well, not that it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was there before, but you're right, no sense in not including object.
*/ | ||
if (!PyDataType_HASFIELDS(to) && to->type_num != NPY_OBJECT) { | ||
return (casting == NPY_UNSAFE_CASTING && | ||
PyDict_Size(from->fields) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Eric won't come around a corner here and spook you with a dtype with nested fields :), it seems probable that there needs to be a recursion somewhere to get it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this should be recursive.
return 1; | ||
} | ||
} | ||
else if (PyDataType_HASFIELDS(to)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about object and subarrays? E.g np.dtype("(1,2)u8")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, of course rather something like> np.dtype(["f1", "f4", (5,)])
or np.dtype([("f1", "f,f")])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for casting to structured array, sub-arrays do not matter - certainly- astype
happily does it. This particular stanza is really mostly there to ensure we return False
when casting is not "unsafe".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For subarrays in from
, astype
seems quite happy to just take the first element. So, for now just mimicking that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for:
from_dt = np.dtype([('a', ('i4,i4', 2))])
to_dt = np.dtype([('a', 'i4')])
For instance
arr = np.array(([(1, 2), (3, 4)],), dtype=from_dt)
arr.astype(to_dt)
What's currently implemented is not "just take the first element", but "take the first element and ignore all casting rules when casting it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your particular example with astype
raises an error in both in current master and in my PR, because from_dt['a']
itself has two fields, which cannot be cast to a single one.
Sadly, though, can_cast
happily claims this works even in my PR as from['a']
has no fields (but does have a subdtype
) - though only for casting='unsafe'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that particular case is solved. But really astype
is weird - it checks can_cast
internally but can still throw an error later. This should not be allowed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, astype casting could fail based on the values. Of course you are right and it should not fail based on information already inside the dtype.
I have to catch up a bit here, sorry, but I do not really want crunch my mind too much about it right now, we are talking about corner cases and if we have new infrastructure things might get easier to align.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - agreed that we should take this bit-by-bit. My main goal really is to ensure that structured with multiple fields to unstructured always returns False
, as this cannot possibly work (and has caused me real head-aches).
7a2d01b
to
2675ef9
Compare
OK, pushed another update. Am still trying to keep the impact minimal. Another worry I have is whether user types can have |
2675ef9
to
09639c4
Compare
Damn, |
09639c4
to
e4efd1d
Compare
Also failing doctest at |
840d924
to
a6f1713
Compare
@seberg, @eric-wieser - what do you think we should do with this PR? Overall, I feel it at least fixes one obvious glaring inconsistency. On the other, it leaves many others. But those generally need more thought at developing a better set of tests. Personally, I'd prefer to approach this incrementally (mostly as I don't think I've got the time to do the larger overhaul...). |
@mhvk sorry, meant to look through this for a bit now, but then did not when I saw we had put 1.18, and thinking I can't merge it anyway ;). |
Ideally this would still go in 1.17... |
The 1.18 just means it isn't a blocker for 1.17, it can certainly go in 1.17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sorry for that. Can't find anything to squirm about right now. But I do think we should add release notes, since casting is changed in subtle ways. As most actual casting uses "unsafe" mode, I doubt that those changes affect a lot of code though (and are not already buggy).
Allow unsafe casting from a simple data type to a structured one with multiple fields, but only from structured data type with a single field (checked recursively) to a simple data type. For now, continue to allow any structured to structured with casting="unsafe", as the current can_cast implementation does not match that of "astype".
a6f1713
to
e12c6bd
Compare
OK, made the changes, and added a release note. |
Previously, ``can_cast`` returned `True` for almost all inputs for | ||
``casting='unsafe'``, even for cases where casting was not possible, such as | ||
from a structured dtype to a regular one. This has been fixed, making it | ||
more consistent with actual casting using, e.g., the ``.astype`` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the change that np.zeros(3, "i4").astype("i4,i4", casting="safe")
(or related can_cast
) should fail after this? But maybe it is very fringe, since astype defaults to "unsafe" in any case, and so does assignment I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I am good with merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let me just merge this...
Cool, thanks Marten. |
As noted in #11114, currently the following clearly wrong things happen:
The present PR fixes it, and has some elementary tests. In particular, it allows casting from simple to structured (EDIT) in unsafe mode, and from structured to simple only in unsafe mode and when the structured dtype has only a single field.
fixes #11114
p.s. Replaces #13646, which confused the general void type with structured.