make recarray.attr return ndarray (not chararray) #5454

ahaldane · 2015-01-14T18:47:04Z

This fixes #3993

This bug in recarrays has bitten me since chararrays remove whitespace for 'S' dtypes, while ndarrays don't.

arr = np.array([(' abc ', 1), (' abc', 2)], dtype=[('str1', 'S5'), ('id', int)])
arr = arr.view(np.recarray)
>>> arr.str1[0] == arr.str1[1]
True
>>> arr['str1'][0] == arr['str1'][1]
False

As far as I can tell all that was required was removing the explicit checks for string types in the recarray code.

charris · 2015-01-14T19:35:52Z

Agree with this change as chararray is effectively deprecated, however this may cause compatibility problems. On that account, this change needs a mention in doc/release/1.10.0-notes.rst and you should post to the list for possible discussion.

Also, the commit message should begin BUG: and there should be a more extended explanation. I think your PR comment would be good for that.

njsmith · 2015-01-14T19:37:30Z

Should we throw in a delectation warning to chararray.init while we're
at it?
On 14 Jan 2015 19:35, "Charles Harris" notifications@github.com wrote:

Agree with this change as chararray is effectively deprecated, however
this may cause compatibility problems. On that account, this change needs a
mention in doc/release/1.10.0-notes.rst and you should post to the list
for possible discussion.

Also, the commit message should begin BUG: and there should be a more
extended explanation. I think your PR comment would be good for that.

—
Reply to this email directly or view it on GitHub
#5454 (comment).

njsmith · 2015-01-14T19:37:52Z

*Deprecation, of course.
On 14 Jan 2015 19:37, njs@pobox.com wrote:

Should we throw in a delectation warning to chararray.init while we're
at it?
On 14 Jan 2015 19:35, "Charles Harris" notifications@github.com wrote:

Agree with this change as chararray is effectively deprecated, however
this may cause compatibility problems. On that account, this change needs a
mention in doc/release/1.10.0-notes.rst and you should post to the list
for possible discussion.

Also, the commit message should begin BUG: and there should be a more
extended explanation. I think your PR comment would be good for that.

—
Reply to this email directly or view it on GitHub
#5454 (comment).

charris · 2015-01-14T19:50:13Z

@njsmith I was considering that but wanted to do a github code search first and need to exclude all the numpy forks, which I've forgotten how to do. I think the people most likely to be affected would be StSci.

charris · 2015-01-14T20:09:06Z

Hmm, chararray looks to be pretty widely used.

ahaldane · 2015-01-14T21:26:38Z

OK, I've updated the commit message and added a note in doc/release/1.10.0-notes.rst.

Should I post to NumPy-discussion or SciPy-dev? I'm note sure which one is for numpy development.

charris · 2015-01-14T21:30:03Z

numpy-discussion

charris · 2015-01-17T01:40:59Z

Doesn't look like anyone noticed the post :( Could you add tests to check the return type of the two examples you used? numpy/core/tests/test_records.py looks like the place.

ahaldane · 2015-01-18T22:00:08Z

I've added some tests. Along the way though, I realized my fix still has a problem and so I've made another change. I was expecting that by removing the chararray lines I would end up with a setup where attribute access behaved identically to index access. However this happens instead:

>>> a = np.array([('abc ', (1,1)), ('abc', (2,3))],
... dtype=[('foo', 'S4'), ('bar', [('A', int),('B', int)])])
>>> a = a.view(np.recarray)
>>> type(a.foo)
    numpy.ndarray
>>> type(a['foo'])
    numpy.core.records.recarray

With the new pull request I've made index access behave identically to attribute access to recarrays (as a second git commit if that's OK). That is, accessing a scalar dtype gives an ndarray but accessing a structured datatype gives a recarray, for both attributes and indexes.

There is a case to be made for having indexing behave differently from attribute access. Accessing fields using indexing is something that already exists for ndarrays. Recarrays are supposed to add attribute access, so it could also make sense for them to leave the index access alone. In this case indexed access would always return an ndarray, while attribute access would return ndarrays or recarrays depending on whether the data type is structured.

On the other hand, it could be confusing that field access by indexes and attributes behaves differently, and it would have to be clearly documented. It might also seems a little strange if indexing an array gives an array of a different type, as would happen if indexing always returns ndarrays.

In either case, the current code returns recarrays for scalar types accessed through indexing, which I don't think makes sense.

Please also read the updated note in doc/release/1.10.0-notes.rst for another explanation.

With this change, I get the following:

>>> type(a['foo'])
    numpy.ndarray
>>> type(a.foo)
    numpy.ndarray
>>> type(a['bar'])
    numpy.core.records.recarray
>>> type(a.bar)
    numpy.core.records.recarray

By the way, it occured to me that the docs would need to change too, but they don't: recarray's use of chararrays wasn't documented anywhere, nor were the return types in general. I'm thinking of expanding the docs.

If any of this seems confusing, please wait a little because I am going to send an email to the numpy list regarding recarray documentation and #3581. Perhaps the issues I raise there should be discussed before this one.

ahaldane · 2015-01-18T22:38:39Z

Correction: I had it a little wrong in my last update. Here's the correct description:

Setup:

>>> a = np.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
... dtype=[('foo', 'S4'), ('bar', [('A', int),('B', int)]), ('baz', int)])
>>> a = a.view(np.recarray)

Previous behavior:

>>> type(a.foo), type(a['foo'])
    (numpy.ndarray, numpy.core.records.recarray)
>>> type(a.bar), type(a['bar'])
    (numpy.core.records.recarray, numpy.core.records.recarray)
>>> type(a.baz), type(a['baz'])
    (numpy.ndarray, numpy.ndarray)

New behavior:

>>> type(a.foo), type(a['foo'])
    (numpy.ndarray, numpy.ndarray)
>>> type(a.bar), type(a['bar'])
    (numpy.core.records.recarray, numpy.core.records.recarray)
>>> type(a.baz), type(a['baz'])
    (numpy.ndarray, numpy.ndarray)

ahaldane · 2015-01-23T18:12:49Z

Just a heads up: Don't merge this; I have more changes I want to make.

I'll also merge the latest commits from origin when the time comes.

ahaldane · 2015-01-25T23:13:16Z

This was automatically closed because of a push I made, but that's just as well.

I wanted to open a new (duplicate) pull request anyway, since I mistakenly make these commits to my master branch.

ahaldane force-pushed the master branch from d17fe0d to 634c3c1 Compare January 14, 2015 21:17

ahaldane force-pushed the master branch from 07b233e to a51bf53 Compare January 18, 2015 22:38

ahaldane mentioned this pull request Jan 22, 2015

DOC: improve record/structured array nomenclature & guide #5482

Merged

ahaldane closed this Jan 25, 2015

ahaldane force-pushed the master branch from a51bf53 to 7ce93ba Compare January 25, 2015 23:10

This was referenced Jan 26, 2015

BUG: Fix recarray getattr and getindex return types #5505

Merged

recarray attributes of type strings will create a ndarray of strings instead of a chararray #3993

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make recarray.attr return ndarray (not chararray) #5454

make recarray.attr return ndarray (not chararray) #5454

ahaldane commented Jan 14, 2015

charris commented Jan 14, 2015

njsmith commented Jan 14, 2015

njsmith commented Jan 14, 2015

charris commented Jan 14, 2015

charris commented Jan 14, 2015

ahaldane commented Jan 14, 2015

charris commented Jan 14, 2015

charris commented Jan 17, 2015

ahaldane commented Jan 18, 2015

ahaldane commented Jan 18, 2015

ahaldane commented Jan 23, 2015

ahaldane commented Jan 25, 2015

make recarray.attr return ndarray (not chararray) #5454

make recarray.attr return ndarray (not chararray) #5454

Conversation

ahaldane commented Jan 14, 2015

charris commented Jan 14, 2015

njsmith commented Jan 14, 2015

njsmith commented Jan 14, 2015

charris commented Jan 14, 2015

charris commented Jan 14, 2015

ahaldane commented Jan 14, 2015

charris commented Jan 14, 2015

charris commented Jan 17, 2015

ahaldane commented Jan 18, 2015

ahaldane commented Jan 18, 2015

ahaldane commented Jan 23, 2015

ahaldane commented Jan 25, 2015