BUG: Fix recarray getattr and getindex return types #5505

ahaldane · 2015-01-26T00:11:14Z

This pull request was originally requested as #5454, but I moved it here because it was on the wrong branch, plus other reasons.

This pull request makes changes to __getitem__ and __getattr__ of recarrays. It makes three notable changes:

A. recarrays no longer convert string ndarrays to chararrays, and instead simply return ndarrays of string type.

This was confusing, and led to bugs for anyone unaware of this special conversion (ie, me) since chararrays trim trailing whitespace but ndarrays fo string type do not, and because it only occured when the field was accessed by attribute (not index).

Old behavior:

>>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
...       dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)])
>>> rec.foo[0] == rec.foo[1]
True
>>> rec['foo'][0] == rec['foo'][1]
False

New behavior: Both lines return False.

I think this is the main compatability risk in this pull request, as some people might have expected the whitespace removal. But I didn't see anything in a quick github search.

B: the return type of fields accessed by index and by attribute was inconsistent for flexible types.

Previous behavior:

>>> type(rec.foo), type(rec['foo'])
(numpy.core.defchararray.chararray, numpy.recarray)
>>> type(rec.bar), type(rec['bar'])
(numpy.recarray, numpy.recarray)
>>> type(rec.baz), type(rec['baz'])
(numpy.ndarray, numpy.ndarray)

New behavior:

>>> type(rec.foo), type(rec['foo'])
(numpy.ndarray, numpy.ndarray)
>>> type(rec.bar), type(rec['bar'])
(numpy.recarray, numpy.recarray)
>>> type(rec.baz), type(rec['baz'])
(numpy.ndarray, numpy.ndarray)

C. dtype.type is now inherited when fields of structured type are accessed

Old Behavior:

>>> rec.dtype.type, rec.bar.dtype.type
(numpy.record, numpy.void)

New behavior:

>>> rec.dtype.type, rec.bar.dtype.type
(numpy.record, numpy.record)

This guarantees that if an array is a "record" array, structured fields will also be returned as "record arrays" (rather than merely views of np.recarray).

I am planning two more recarray-related pull requests: One to fix recarray.__repr__, and another to reduce dependence on numpy.rec.format_parser, which duplicates logic in the descriptor.c parser.

charris · 2015-01-26T21:54:23Z

numpy/core/records.py

            # otherwise return the object
            try:
                dt = obj.dtype
            except AttributeError:
-                return obj
+                return obj #happens if field is Object type


Please, no trailing comments except, perhaps, for C struct elements. Just put it on the previous line.

charris · 2015-01-26T23:14:30Z

numpy/core/tests/test_records.py

+    def test_recarray_stringtypes(self):
+        # Issue #3993
+        a = np.array([('abc ', 1), ('abc', 2)],
+                           dtype=[('foo', 'S4'), ('bar', int)])


PEP8 indentation.

charris · 2015-01-26T23:18:38Z

LGTM modulo pep8 nitpick. I really like your commit messages.

I'm not confident that the current tests for record arrays are all that good. I don't think these changes will cause trouble, but probably we will need to wait on the next release for it to get real testing.

ahaldane · 2015-01-27T01:12:11Z

Thanks, and I agree recarray tests are a bit spotty. By waiting, do you mean that this won't be merged until a while from now? I ask because I have other recarray changes I want to make on top of these ones - so should I wait until this is merged, or should I make further commits on this branch and then merge them all at once in the next release?

Also, I tried running the scipy tests and I get some errors related to recarrays, but they appear to be due to #3256 rather than my changes. That is another bug that in the same region of code I am looking at, maybe I can fix it too.

This commit makes changes to `__getitem__` and `__getattr__` of recarrays: 1. recarrays no longer convert string ndarrays to chararrays, and instead simply return ndarrays of string type. 2. attribute access and index access of fields now behaves identically 3. dtype.type is now inherited when fields of structured type are accessed Demonstration: >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) Old Behavior: >>> type(rec.foo), type(rec['foo']) (numpy.core.defchararray.chararray, numpy.recarray) >>> type(rec.bar), type(rec['bar']), rec.bar.dtype.type (numpy.recarray, numpy.recarray, numpy.void) >>> type(rec.baz), type(rec['baz']) (numpy.ndarray, numpy.ndarray) New behavior: >>> type(rec.foo), type(rec['foo']) (numpy.ndarray, numpy.ndarray) >>> type(rec.bar), type(rec['bar']), rec.bar.dtype.type (numpy.recarray, numpy.recarray, numpy.record) >>> type(rec.baz), type(rec['baz']) (numpy.ndarray, numpy.ndarray)

charris · 2015-01-27T01:24:25Z

No, I want to merge it now, but there may be problems that turn up during the release process or shortly thereafter, so keep that in mind. There are about 350K downloads a month from pypi, that is a lot more testing than we can hope to do up front.

ahaldane · 2015-01-27T01:47:43Z

I'll be around to fix it when it goes wrong :)

BUG: Fix recarray getattr and getindex return types

charris · 2015-01-27T01:52:24Z

Merging, thanks @ahaldane .

This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)

ahaldane force-pushed the recarray_returntype branch 2 times, most recently from 80631a3 to a1fb996 Compare January 26, 2015 02:53

charris reviewed Jan 26, 2015
View reviewed changes

ahaldane force-pushed the recarray_returntype branch 2 times, most recently from ba846da to e34f0c9 Compare January 26, 2015 23:08

charris reviewed Jan 26, 2015
View reviewed changes

ahaldane force-pushed the recarray_returntype branch from e34f0c9 to 3cd9e73 Compare January 27, 2015 01:20

charris added a commit that referenced this pull request Jan 27, 2015

Merge pull request #5505 from ahaldane/recarray_returntype

fbcc24f

BUG: Fix recarray getattr and getindex return types

charris merged commit fbcc24f into numpy:master Jan 27, 2015

ahaldane mentioned this pull request Jan 30, 2015

recarray attributes of type strings will create a ndarray of strings instead of a chararray #3993

Closed

embray mentioned this pull request May 27, 2015

BUG: Further fixes to record and recarray getitem/getattr #5921

Merged

ahaldane mentioned this pull request Jun 18, 2015

Numpy 1.10 issues for io.fits astropy/astropy#3854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix recarray getattr and getindex return types #5505

BUG: Fix recarray getattr and getindex return types #5505

Uh oh!

ahaldane commented Jan 26, 2015

Uh oh!

charris Jan 26, 2015

Uh oh!

charris Jan 26, 2015

Uh oh!

charris commented Jan 26, 2015

Uh oh!

ahaldane commented Jan 27, 2015

Uh oh!

charris commented Jan 27, 2015

Uh oh!

ahaldane commented Jan 27, 2015

Uh oh!

charris commented Jan 27, 2015

Uh oh!

Uh oh!

Uh oh!

BUG: Fix recarray getattr and getindex return types #5505

BUG: Fix recarray getattr and getindex return types #5505

Uh oh!

Conversation

ahaldane commented Jan 26, 2015

Uh oh!

charris Jan 26, 2015

Choose a reason for hiding this comment

Uh oh!

charris Jan 26, 2015

Choose a reason for hiding this comment

Uh oh!

charris commented Jan 26, 2015

Uh oh!

ahaldane commented Jan 27, 2015

Uh oh!

charris commented Jan 27, 2015

Uh oh!

ahaldane commented Jan 27, 2015

Uh oh!

charris commented Jan 27, 2015

Uh oh!

Uh oh!