-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Fix recarray getattr and getindex return types #5505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
80631a3
to
a1fb996
Compare
# otherwise return the object | ||
try: | ||
dt = obj.dtype | ||
except AttributeError: | ||
return obj | ||
return obj #happens if field is Object type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, no trailing comments except, perhaps, for C struct elements. Just put it on the previous line.
ba846da
to
e34f0c9
Compare
def test_recarray_stringtypes(self): | ||
# Issue #3993 | ||
a = np.array([('abc ', 1), ('abc', 2)], | ||
dtype=[('foo', 'S4'), ('bar', int)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 indentation.
LGTM modulo pep8 nitpick. I really like your commit messages. I'm not confident that the current tests for record arrays are all that good. I don't think these changes will cause trouble, but probably we will need to wait on the next release for it to get real testing. |
Thanks, and I agree recarray tests are a bit spotty. By waiting, do you mean that this won't be merged until a while from now? I ask because I have other recarray changes I want to make on top of these ones - so should I wait until this is merged, or should I make further commits on this branch and then merge them all at once in the next release? Also, I tried running the scipy tests and I get some errors related to recarrays, but they appear to be due to #3256 rather than my changes. That is another bug that in the same region of code I am looking at, maybe I can fix it too. |
This commit makes changes to `__getitem__` and `__getattr__` of recarrays: 1. recarrays no longer convert string ndarrays to chararrays, and instead simply return ndarrays of string type. 2. attribute access and index access of fields now behaves identically 3. dtype.type is now inherited when fields of structured type are accessed Demonstration: >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) Old Behavior: >>> type(rec.foo), type(rec['foo']) (numpy.core.defchararray.chararray, numpy.recarray) >>> type(rec.bar), type(rec['bar']), rec.bar.dtype.type (numpy.recarray, numpy.recarray, numpy.void) >>> type(rec.baz), type(rec['baz']) (numpy.ndarray, numpy.ndarray) New behavior: >>> type(rec.foo), type(rec['foo']) (numpy.ndarray, numpy.ndarray) >>> type(rec.bar), type(rec['bar']), rec.bar.dtype.type (numpy.recarray, numpy.recarray, numpy.record) >>> type(rec.baz), type(rec['baz']) (numpy.ndarray, numpy.ndarray)
e34f0c9
to
3cd9e73
Compare
No, I want to merge it now, but there may be problems that turn up during the release process or shortly thereafter, so keep that in mind. There are about 350K downloads a month from pypi, that is a lot more testing than we can hope to do up front. |
I'll be around to fix it when it goes wrong :) |
BUG: Fix recarray getattr and getindex return types
Merging, thanks @ahaldane . |
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)
This pull request was originally requested as #5454, but I moved it here because it was on the wrong branch, plus other reasons.
This pull request makes changes to
__getitem__
and__getattr__
of recarrays. It makes three notable changes:A. recarrays no longer convert string ndarrays to chararrays, and instead simply return ndarrays of string type.
This was confusing, and led to bugs for anyone unaware of this special conversion (ie, me) since chararrays trim trailing whitespace but ndarrays fo string type do not, and because it only occured when the field was accessed by attribute (not index).
Old behavior:
New behavior: Both lines return False.
I think this is the main compatability risk in this pull request, as some people might have expected the whitespace removal. But I didn't see anything in a quick github search.
B: the return type of fields accessed by index and by attribute was inconsistent for flexible types.
Previous behavior:
New behavior:
C. dtype.type is now inherited when fields of structured type are accessed
Old Behavior:
New behavior:
This guarantees that if an array is a "record" array, structured fields will also be returned as "record arrays" (rather than merely views of np.recarray).
I am planning two more recarray-related pull requests: One to fix
recarray.__repr__
, and another to reduce dependence onnumpy.rec.format_parser
, which duplicates logic in the descriptor.c parser.