-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype #5920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
I'm probably missing some corner case, but... Why don't we simply rename |
I considered that at first, but I don't know what the performance implications would be for using the generic methods over the dict-specific functions (which is more common), though maybe since it's just type construction it would all be in the margins anyways. I could try it and see. However, |
Commit messages need some work ;) |
d901ef3
to
4771f10
Compare
Squashed to a single commit (since the followup commits were just for a couple cases I missed before I added the tests). |
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
Even if we have to stick with the current type check, I still think using It doesn't look like there should be much of a performance hit:
And it would let you get rid of that ugly, almost an exact duplicate of the previous one, extra branch at the bottom of the function. |
@jaimefrio By getting rid of that duplicate branch, do you mean changing all the (I suppose I could also include a macro for the latter check). |
Yes, that's what I had in mind. |
@jaimefrio This has been updated. |
I don't think it has... |
I'll take a look--I still plan to update this with @jaimefrio's suggestion. |
…ne wrinkle this revealed is that PyMapping_GetItemString sets a KeyError exception on lookup failure, while PyDict_GetItemString does not--this bug could have surfaced anyways if using the PyMapping method alongside the PyDict method.
I've pushed an updated version of the patch using just This could have been a problem in the original version of the patch too, since the error state wasn't been cleared for all failed dict lookups. |
I looked in the dtype docs for anything that this could conflict with, but didn't see anything. It seems like a bit much of a rare corner-case to be worth calling out explicitly (users will rarely be creating their own dictproxy objects). I think in this case it's best that there's just less surprise when it does work. |
@@ -1165,20 +1175,23 @@ _convert_from_dict(PyObject *obj, int align) | |||
} | |||
/* Set the itemsize */ | |||
new->elsize = itemsize; | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier to follow if this branch came first, as you did for 'aligned'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Almost there! ;-) |
I'll squash again before ready to merge, if desired. |
Squashing would be nice, it is ready to go now. |
a dtype arraydescr_new allows passing in a dict to represent a multi-field dtype. However, the .fields attribute on such dtypes is returned as a dictproxy object. Therefore it is convenient, for copying existing multi-field dtypes, to be able to pass in a dictproxy object as well.
03a9e8b
to
1acd14c
Compare
Rebased on master and squashed. |
ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype
Thanks Erik! |
Great, thanks for reviewing and the quick merge. |
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636
arraydescr_new
allows passing in a dict to represent a multi-fielddtype. However, the .fields attribute on such dtypes is returned
as a dictproxy object. Therefore it is convenient, for copying
existing multi-field dtypes, to be able to pass in a dictproxy object
as well.
I have another PR I will post shortly that motivated this, and that
demonstrates an example of the (admittedly uncommon) case
where this is useful.