Skip to content

BUG: Disallow type-promotion of dtypes with different field names #15509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Feb 4, 2020

Fixes #15494

The fix seems fairly painless.

I also considered if we should restrict things even more, but didn't do so. For instance, what about if the field names and order are the same, but the byte-offsets of the fields are different. Which dtype "wins"?

Eg:

>>> A = ("a", "<i8") 
>>> B = ("b", "<i8") 
>>> ab = np.rec.array(obj=np.array([]), dtype=[A, B]) 
>>> ba = np.rec.array(obj=np.array([]), dtype=[B, A])                                       
>>> np.concatenate([a_b_, b_a_[['a', 'b']]])                                                  
array([],
      dtype=(numpy.record, {'names':['a','b'], 'formats':['<i8','<i8'], 'offsets':[8,0], 'itemsize':16}))

The behavior shown here is actually the same as the behavior from 1.13 and before (with assign-by-fieldname), where the last dtype would be used in such cases, just like here. Since that is old behavior, I opted to leave it alone.

@ahaldane ahaldane force-pushed the fix_can_cast_fields branch from 493db3e to df1ae03 Compare February 4, 2020 20:58
@eric-wieser
Copy link
Member

eric-wieser commented Feb 4, 2020

Why does ab[:] = b_a_ still work after this change? It looks like this would make that casting illegal, which obviously we don't want.

@ahaldane
Copy link
Member Author

ahaldane commented Feb 4, 2020

As I recall, casting (eg, .astype) is different from Type promotion (eg, ufunc binop rules), but I need to refresh myself a bit...

This PR has doesn't affect .astype.

@ahaldane ahaldane force-pushed the fix_can_cast_fields branch 2 times, most recently from 6695f7a to 4e840ae Compare February 4, 2020 21:23
# gh-15494
A = ("a", "<i8")
B = ("b", "<i8")
ab = np.rec.array(obj=np.array([(1, 2), dtype='<i8']), dtype=[A, B])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not throw recarray in here, the issue is more fundamental:

Suggested change
ab = np.rec.array(obj=np.array([(1, 2), dtype='<i8']), dtype=[A, B])
ab = np.array([(1, 2)], dtype=[A, B])

A = ("a", "<i8")
B = ("b", "<i8")
ab = np.rec.array(obj=np.array([(1, 2), dtype='<i8']), dtype=[A, B])
ba = np.rec.array(obj=np.array([(1, 2), dtype='<i8']), dtype=[B, A])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ba = np.rec.array(obj=np.array([(1, 2), dtype='<i8']), dtype=[B, A])
ba = np.array([(1, 2)], dtype=[B, A])

@ahaldane
Copy link
Member Author

ahaldane commented Feb 4, 2020

Hmm, this was also touched in #13648. I haven't gone through it yet.

More relevant: #13667

@ahaldane
Copy link
Member Author

ahaldane commented Feb 4, 2020

Oh, yeah, this PR isn't quite right. It should totally ignore the field names, since can_cast should only depend on the field order.... working on it

@eric-wieser
Copy link
Member

Oh, yeah, this PR isn't quite right. It should totally ignore the field names

Ah, but if we do that we'd be declaring the concatenate behavior as intentional, which I'm not sure is right - or at least, it wouldn't make the problem go away.


val = PyObject_RichCompareBool(from->names, to->names, Py_EQ);
if (val != 1 || PyErr_Occurred()) {
PyErr_Clear();
Copy link
Member

@eric-wieser eric-wieser Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should clear the error here, else MemoryError translates to "can't cast" which is nonsense.

while (PyDict_Next(field1, &ppos, &key, &tuple1)) {
if ((tuple2 = PyDict_GetItem(field2, key)) == NULL) {
while (PyDict_Next(from->fields, &ppos, &key, &tuple1)) {
if ((tuple2 = PyDict_GetItem(to->fields, key)) == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((tuple2 = PyDict_GetItem(to->fields, key)) == NULL) {
if ((tuple2 = PyDict_GetItemWithError(to->fields, key)) == NULL) {
if (PyErr_Occurred()) {
return -1;
}

@ahaldane
Copy link
Member Author

ahaldane commented Feb 4, 2020

It seems like we want for two dtypes with different field names but same field dtypes:

  • can be cast to each other
  • cannot be promoted to a common type unambiguously

So can_cast should succeed, but promote_types should fail. Does that make sense?

@charris charris added this to the 1.18.2 release milestone Feb 4, 2020
@ahaldane ahaldane force-pushed the fix_can_cast_fields branch from 4e840ae to 62a732f Compare February 5, 2020 00:06
@ahaldane
Copy link
Member Author

ahaldane commented Feb 5, 2020

All right. So I fixed up can_cast_fields to ignore the field names when determining if a structured cast is possible, which is more consistent with the "*-by-position" strategy we use now.

There was a complication, that when doing a == b on two structured arrays, numpy currently has some half-deprecated code which tested whether a and b were castable (rather than promotable) before deciding to try to do elementwise comparison or to fail and return a scalar. Because more things now count as castable, we fall into the elementwise case more often, and it turns out the elementwise case had some special-case code for structured arrays which was still using the old *-by-fieldname pattern in _void_compare. So I fixed up _void_compare to do comparisons by-field-position.

I also added a paragraph to the structured array docs about all this.

@ahaldane ahaldane force-pushed the fix_can_cast_fields branch 2 times, most recently from fee0266 to 85624c5 Compare February 5, 2020 00:18
@@ -152,7 +152,8 @@ def test_recarrays(self):

self._test_equal(a, b)

c = np.empty(2, [('floupipi', float), ('floupa', float)])
c = np.empty(2, [('floupipi', float),
('floupi', float), ('floupa', float)])
c['floupipi'] = a['floupi'].copy()
c['floupa'] = a['floupa'].copy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was needed because now the comparison 3 lines below here falls into the elementwise comparison case, since b and c were now castable after thie PR. So here I added an extra field to make sure b and c are not castable so they trip the warning.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, thanks!

Some style comments, and a test I think we missed.

if (!PyErr_Occurred()) {
PyErr_SetString(PyExc_ValueError, "bug, should not happen");
}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0;
return -1;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, guess the caller doesn't support this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case, you'll need to call PyErr_Clear() here instead.

I've a local branch that tries to propagate errors out of here, but it's a bit invasive, and less important than this patch.

Comment on lines 1108 to 1150
Py_INCREF(type1);
return type1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this behavior? In that if two dtypes with different padding are specified, the first/last one is returned by identity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the way I wrote it using "equivalent_fields" this rejects promotion for dtypes with different padding/offsets. Probably that's safer anyway so no-one gets their padding changed unexpectedly.

Added a test for that case.

@ahaldane ahaldane force-pushed the fix_can_cast_fields branch from 85624c5 to 08a6460 Compare February 5, 2020 18:31
@ahaldane ahaldane changed the title BUG: can_cast_fields should reject dtypes with diff field order BUG: Disallow type-promotion of dtypes with different field names Feb 5, 2020
@ahaldane ahaldane force-pushed the fix_can_cast_fields branch 5 times, most recently from 487825f to 320fdef Compare February 5, 2020 21:01
@eric-wieser eric-wieser self-requested a review May 12, 2020 06:40
@charris
Copy link
Member

charris commented May 16, 2020

Pushing off to 1.20.x.

@ahaldane ahaldane force-pushed the fix_can_cast_fields branch from d89df8f to 1c2cf51 Compare May 18, 2020 18:53
@charris
Copy link
Member

charris commented Dec 5, 2020

Ping all, and rebase needed.

@seberg
Copy link
Member

seberg commented Dec 5, 2020

This needs more than a rebase now :), there is an additional code path here on master which would also need updating. Happy to review! But unless pinged explicitly, I will not prioritize this until the code that the current PR touches is deleted :).

@ahaldane
Copy link
Member Author

ahaldane commented Dec 6, 2020

@seberg, just to be clear, you recommend I wait before rebasing until you've completed more of the dtype updates?

I'm looking through the new master code now, I can see you have some TODOs in the code related to this PR.

@seberg
Copy link
Member

seberg commented Dec 6, 2020

@ahaldane yeah, I specifically marked them. I am happy to rebase it now, it probably isn't much harder; We just have to change both the new and the old version (which probably just means keep the changes here and do the same changes in the new version).

Base automatically changed from master to main March 4, 2021 02:04
@mattip mattip modified the milestones: 1.21.0 release, 1.22.0 release May 5, 2021
@mattip
Copy link
Member

mattip commented May 5, 2021

This should go in after #18676 (it will need some fixups)

seberg added a commit to seberg/numpy that referenced this pull request Jun 11, 2021
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
@seberg
Copy link
Member

seberg commented Jun 11, 2021

Closing, gh-19226 is the equivalent change taking into account that all of this has changed quite a lot. (Unfortunately, there is currently still a small API addition missing for DTypes to finish the puzzle.)

@seberg seberg closed this Jun 11, 2021
seberg added a commit to seberg/numpy that referenced this pull request Jun 26, 2021
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
seberg added a commit to seberg/numpy that referenced this pull request Jun 26, 2021
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
seberg added a commit to seberg/numpy that referenced this pull request Feb 24, 2022
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
seberg added a commit to seberg/numpy that referenced this pull request Feb 24, 2022
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
seberg added a commit to seberg/numpy that referenced this pull request May 5, 2022
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
seberg added a commit to seberg/numpy that referenced this pull request May 9, 2022
This PR replaces the old numpygh-15509 implementing proper type promotion
for structured voids.  It further fixes the casting safety to consider
casts with equivalent field number and matching order as "safe"
and if the names, titles, and offsets match as "equiv".

The change perculates into the void comparison, and since it fixes
the order, it removes the current FutureWarning there as well.

This addresses liberfa/pyerfa#77
and replaces numpygh-15509 (the implementation has changed too much).

Fixes numpygh-15494  (and probably a few more)

Co-authored-by: Allan Haldane <allan.haldane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. component: numpy._core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: concatenate of structured dtypes does not check field orders match
5 participants