Skip to content

make recarray.attr return ndarray (not chararray) #5454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

ahaldane
Copy link
Member

This fixes #3993

This bug in recarrays has bitten me since chararrays remove whitespace for 'S' dtypes, while ndarrays don't.

arr = np.array([(' abc ', 1), (' abc', 2)], dtype=[('str1', 'S5'), ('id', int)])
arr = arr.view(np.recarray)
>>> arr.str1[0] == arr.str1[1]
True
>>> arr['str1'][0] == arr['str1'][1]
False

As far as I can tell all that was required was removing the explicit checks for string types in the recarray code.

@charris
Copy link
Member

charris commented Jan 14, 2015

Agree with this change as chararray is effectively deprecated, however this may cause compatibility problems. On that account, this change needs a mention in doc/release/1.10.0-notes.rst and you should post to the list for possible discussion.

Also, the commit message should begin BUG: and there should be a more extended explanation. I think your PR comment would be good for that.

@njsmith
Copy link
Member

njsmith commented Jan 14, 2015

Should we throw in a delectation warning to chararray.init while we're
at it?
On 14 Jan 2015 19:35, "Charles Harris" notifications@github.com wrote:

Agree with this change as chararray is effectively deprecated, however
this may cause compatibility problems. On that account, this change needs a
mention in doc/release/1.10.0-notes.rst and you should post to the list
for possible discussion.

Also, the commit message should begin BUG: and there should be a more
extended explanation. I think your PR comment would be good for that.


Reply to this email directly or view it on GitHub
#5454 (comment).

@njsmith
Copy link
Member

njsmith commented Jan 14, 2015

*Deprecation, of course.
On 14 Jan 2015 19:37, njs@pobox.com wrote:

Should we throw in a delectation warning to chararray.init while we're
at it?
On 14 Jan 2015 19:35, "Charles Harris" notifications@github.com wrote:

Agree with this change as chararray is effectively deprecated, however
this may cause compatibility problems. On that account, this change needs a
mention in doc/release/1.10.0-notes.rst and you should post to the list
for possible discussion.

Also, the commit message should begin BUG: and there should be a more
extended explanation. I think your PR comment would be good for that.


Reply to this email directly or view it on GitHub
#5454 (comment).

@charris
Copy link
Member

charris commented Jan 14, 2015

@njsmith I was considering that but wanted to do a github code search first and need to exclude all the numpy forks, which I've forgotten how to do. I think the people most likely to be affected would be StSci.

@charris
Copy link
Member

charris commented Jan 14, 2015

Hmm, chararray looks to be pretty widely used.

@ahaldane
Copy link
Member Author

OK, I've updated the commit message and added a note in doc/release/1.10.0-notes.rst.

Should I post to NumPy-discussion or SciPy-dev? I'm note sure which one is for numpy development.

@charris
Copy link
Member

charris commented Jan 14, 2015

numpy-discussion

@charris
Copy link
Member

charris commented Jan 17, 2015

Doesn't look like anyone noticed the post :( Could you add tests to check the return type of the two examples you used? numpy/core/tests/test_records.py looks like the place.

@ahaldane
Copy link
Member Author

I've added some tests. Along the way though, I realized my fix still has a problem and so I've made another change. I was expecting that by removing the chararray lines I would end up with a setup where attribute access behaved identically to index access. However this happens instead:

>>> a = np.array([('abc ', (1,1)), ('abc', (2,3))],
... dtype=[('foo', 'S4'), ('bar', [('A', int),('B', int)])])
>>> a = a.view(np.recarray)
>>> type(a.foo)
    numpy.ndarray
>>> type(a['foo'])
    numpy.core.records.recarray

With the new pull request I've made index access behave identically to attribute access to recarrays (as a second git commit if that's OK). That is, accessing a scalar dtype gives an ndarray but accessing a structured datatype gives a recarray, for both attributes and indexes.

There is a case to be made for having indexing behave differently from attribute access. Accessing fields using indexing is something that already exists for ndarrays. Recarrays are supposed to add attribute access, so it could also make sense for them to leave the index access alone. In this case indexed access would always return an ndarray, while attribute access would return ndarrays or recarrays depending on whether the data type is structured.

On the other hand, it could be confusing that field access by indexes and attributes behaves differently, and it would have to be clearly documented. It might also seems a little strange if indexing an array gives an array of a different type, as would happen if indexing always returns ndarrays.

In either case, the current code returns recarrays for scalar types accessed through indexing, which I don't think makes sense.

Please also read the updated note in doc/release/1.10.0-notes.rst for another explanation.

With this change, I get the following:

>>> type(a['foo'])
    numpy.ndarray
>>> type(a.foo)
    numpy.ndarray
>>> type(a['bar'])
    numpy.core.records.recarray
>>> type(a.bar)
    numpy.core.records.recarray

By the way, it occured to me that the docs would need to change too, but they don't: recarray's use of chararrays wasn't documented anywhere, nor were the return types in general. I'm thinking of expanding the docs.

If any of this seems confusing, please wait a little because I am going to send an email to the numpy list regarding recarray documentation and #3581. Perhaps the issues I raise there should be discussed before this one.

@ahaldane
Copy link
Member Author

Correction: I had it a little wrong in my last update. Here's the correct description:

Setup:

>>> a = np.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
... dtype=[('foo', 'S4'), ('bar', [('A', int),('B', int)]), ('baz', int)])
>>> a = a.view(np.recarray)

Previous behavior:

>>> type(a.foo), type(a['foo'])
    (numpy.ndarray, numpy.core.records.recarray)
>>> type(a.bar), type(a['bar'])
    (numpy.core.records.recarray, numpy.core.records.recarray)
>>> type(a.baz), type(a['baz'])
    (numpy.ndarray, numpy.ndarray)

New behavior:

>>> type(a.foo), type(a['foo'])
    (numpy.ndarray, numpy.ndarray)
>>> type(a.bar), type(a['bar'])
    (numpy.core.records.recarray, numpy.core.records.recarray)
>>> type(a.baz), type(a['baz'])
    (numpy.ndarray, numpy.ndarray)

@ahaldane
Copy link
Member Author

Just a heads up: Don't merge this; I have more changes I want to make.

I'll also merge the latest commits from origin when the time comes.

@ahaldane
Copy link
Member Author

This was automatically closed because of a push I made, but that's just as well.

I wanted to open a new (duplicate) pull request anyway, since I mistakenly make these commits to my master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recarray attributes of type strings will create a ndarray of strings instead of a chararray
3 participants