Skip to content

ENH: simplify field indexing of structured arrays #5636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2015

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Mar 6, 2015

This commit simplifies the code in array_subscript and array_assign_subscript related to field access. This fixes #4806, and also removes a potential segfaults, eg if the array is indexed using an sequence-like object that raises an exception in getitem.

KeyError vs ValueError

One minor comment on field subscripts: Currently the following code raises a ValueError:

>>> a = np.zeros((1,), dtype=[('f1', 'i4')]
>>> a['nofield']

However KeyError seems slightly better to me, and it's what my code "naturally" produced (but I corrected it since some tests expected ValueError). Is it worth changing? (Probably not, for back-compatability).

(Also note that multi-field indexing still returns a copy, though it had been planned to return a view in 1.10. I originally considered changing that here, but I left that for another time).

@ahaldane ahaldane force-pushed the rework_index_fields branch from 86607bb to 6c73448 Compare March 6, 2015 04:55
PyObject *obj;
/* field access */
if (PyDataType_HASFIELDS(PyArray_DESCR(self)) &&
obj_is_string_or_stringlist(ind)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just this one note for the moment. This seems wrong, we do not support assignmnet to multiple fields at once. You could do this (but think it might be a bad idea, considering how bad multi field assignments are right now), if you give the python function an option to return a view already now. Otherwise we are basically assigning into dev NULL ;).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had intended for the following two examples to behave similarly.

>>> a = np.array([(1,2,3),(4,5,6)], 
                 dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')])
>>> b = a[['f0', 'f2']]
>>> b[:] = (10,20)

>>> a = np.array([(1,2,3),(4,5,6)],
                 dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')])
>>> a[['f0', 'f2']] = (10,20)

The first example was already possible - multi-field indexing returns a copy, and shows a warning if you try to write to it. With this PR, I wanted second example to also show a warning. (Both examples effectively write to dev/null).

I forgot to show the warning though. I will add it in array_assign_subscript.

I don't think multi-field indexing should return a view (even as an option) yet. I tried doing so, but indeed, as you mention, I discovered that it would behave quite strangely. I think more extensive changes are needed before multi-field assignment is really allowed.

@ahaldane ahaldane force-pushed the rework_index_fields branch from 6c73448 to c651a7d Compare March 6, 2015 18:07
@ahaldane
Copy link
Member Author

ahaldane commented Mar 6, 2015

All right, I added the warning. It is easy to switch back to forbidding multi-field subscript assignment if that's better. The advantage with the current code is that once multi-field indexing becomes feasible all that is needed is to remove the warning. A possible downside is the warning is shown on every assignment attempt (I didn't see a way to prevent this).

/* warn if writing to a copy. copies will have no base */
if (PyArray_BASE(view) == NULL) {
/* the warning will be shown on every attempt */
PyArray_ENABLEFLAGS((PyArrayObject*)view, NPY_ARRAY_WARN_ON_WRITE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it just warns, it needs to raise an error. I do not think I would care about the future warning, if we want to change it, we can just change it. Your example will work fine also for the second case, that is not the thing, the first case is just also broken enough in my opinion, but then I don't like recarrays.
In any case, I am not sure the changes to this part give any gain at all to be honest, unless you want the multi index thing.

@charris
Copy link
Member

charris commented Mar 6, 2015

@ahaldane I did have a branch for taking views from structures, but a test failed due to changes in filler. I thought I made a post, but can't find it now. In any case, the result of the change was not completely back compatible, but the failures would probably be few.

Edit: mentioned in #3641

@ahaldane
Copy link
Member Author

@seberg, I did have multi-field assignment in mind, but since that isn't implemented it's true the changes to array_assign_subscript don't do much.

I'm fine with changing array_assign_subscript back, but let me try one other variation first: I now raise an error on multi-field assignment attempt. The difference is the error message now says "multi-field assignment is not supported" but it used to say "only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices", and raises ValueError instead of IndexError.

@ahaldane
Copy link
Member Author

@charris Alignment issues hadn't occurred to me. Thanks, I'll keep it in mind.

@seberg
Copy link
Member

seberg commented Mar 14, 2015

Throwing a new custom IndexError seems like a great change to me. Calling the python code is fine with me, the only thing I could disagree with would be speed reasons, and to be honest I doubt they are large, and I don't care about recarray speed ;).
.

@ahaldane ahaldane force-pushed the rework_index_fields branch from 2a10597 to 9df52c6 Compare March 14, 2015 19:29
@ahaldane
Copy link
Member Author

Sounds good!

I added another small change as a second commit, to fix #5631.

@charris
Copy link
Member

charris commented Mar 14, 2015

@seberg Are you suggesting an IndexError be added to numpy/__init__.py? If so it should also be mentioned in the release notes.

@seberg
Copy link
Member

seberg commented Mar 15, 2015

Oh sorry, "custom" wasn't meant like that, no. I just wrote that because before it dropped through to the normal indexing for which fields don't make sense really. Should maybe mention a chane in error type anyway.

@charris charris added this to the 1.10.0 release milestone Jun 12, 2015
@charris
Copy link
Member

charris commented Jun 12, 2015

@ahaldane Needs rebase. @seberg comments?

@ahaldane ahaldane force-pushed the rework_index_fields branch 2 times, most recently from 8dfbe5a to c510365 Compare June 12, 2015 06:29
@ahaldane
Copy link
Member Author

rebased + tidied up.

Here's the current summary:

  1. Tweaked _index_fields, so that now it handles individual field indices, and also does more careful sanity checks (fixing Record access on non-existing fields: no error issued #4806). It is a small step closer to allowing multi-field assignment, if that will ever happen.

  2. Simplified array_subscript and array_assign_subscript: It now leaves most of the work to _index_fields, and raises a more descriptive error on attempts to do multi-field assignment. The check for field indexing is now a separate function obj_is_string_or_stringlist, which fixes a potential segfault present before. I also just updated it to use the new npy_cache_pyfunc.

  3. Bugfix in VOID_setitem, to fix Cryptic SystemError when creating array with weird structured-but-empty dtype #5631. (fixes the n=0 case in that code)

@@ -287,23 +287,45 @@ def _newnames(datatype, order):
return tuple(list(order) + nameslist)
raise ValueError("unsupported order value: %s" % (order,))

def isstring(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a conditional def instead of a condition inside the function. Do we actually support unicode fields on python 2? If not, I think you can just use str, if we do, you can use six.string_types instead.

@seberg
Copy link
Member

seberg commented Jun 12, 2015

I will have a better look tomorrow morning probably. Don't have time right now, and more annoying stuff happened also ;).

@ahaldane ahaldane force-pushed the rework_index_fields branch from c510365 to 56ae928 Compare June 12, 2015 16:55
@ahaldane
Copy link
Member Author

Modified isstring as suggested. And while in python2 fields cannot be unicode, field titles can be, so I do have to take care of unicode.

I also added a provisional commit changing the ValueError to a KeyError for invalid field access. It required tweaking 4 unit tests plus a line in lib/recfunctions.py. I would definitely prefer it to be a KeyError, but a github search did turn up one case expecting ValueError , here. Any opinions?

@ahaldane ahaldane force-pushed the rework_index_fields branch from 56ae928 to b3a77c6 Compare June 12, 2015 17:12
view_dtype = {'names': names, 'formats': formats,
'offsets': offsets, 'itemsize': dt.itemsize}

#return copy for now (future plan to return ary.view(dtype=view_dtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after # ;), next line also misses some spaces, but it isn't your line

@seberg
Copy link
Member

seberg commented Jun 13, 2015

Looks good. Some nitpicks and possibly tests for basic unicode access would be nice. But should be ready modulo that unicode thing (which is likely just be me confused). (or modulo some veto about the KeyError thing).

@ahaldane ahaldane force-pushed the rework_index_fields branch 2 times, most recently from 4226bf4 to 3459ad1 Compare June 13, 2015 17:17
@ahaldane
Copy link
Member Author

I went ahead and moved all imports in _internal.py to the top. Otherwise fixed most little problems, and there are already lots of tests for unicode (which helped me catch unicode bugs here).

I'm not sure why a previous PR's commit is getting lumped in here.. maybe I messed up a rebase somehow?

Once the ValueError->KeyError change can go forward, I'll squash everything.

@charris
Copy link
Member

charris commented Jun 15, 2015

I think it would be better to save the change ValueError->KeyError for later. The second is preferable, but the first is OK and I suspect we are going to have enough campatibility problems with 1.10 without introducing one we are already aware of.

@seberg
Copy link
Member

seberg commented Jun 16, 2015

@charris ok, @ahaldane sorry for suggesting to go ahead with it. Could you separate the two things? Probably we can give it a shot and merge it into master when 1.10 is released/branched off. Also gives it a lot of time for complains to start before an actual release.

@ahaldane
Copy link
Member Author

all right, will do within the next day or two

@charris
Copy link
Member

charris commented Jun 16, 2015

If you rebase on master the extra PR will go away.

@ahaldane ahaldane force-pushed the rework_index_fields branch from 3459ad1 to f5c6ae1 Compare June 17, 2015 17:30
This commit simplifies the code in array_subscript and
array_assign_subscript related to field access. This fixes numpy#4806,
and also removes a potential segfaults, eg if the array is indexed using
an sequence-like object that raises an exception in getitem.

Also fixes numpy#5631, related to creation of structured dtypes
with no fields (an unusual and probably useless edge case).

Also moves all imports in _internal.py to the top.

Fixes numpy#4806.
Fixes numpy#5631.
@ahaldane ahaldane force-pushed the rework_index_fields branch from f5c6ae1 to 3c1a13d Compare June 17, 2015 17:51
@ahaldane
Copy link
Member Author

Undid KeyError changes, rebased, squashed, and added a docstring to _index_fields.

charris added a commit that referenced this pull request Jun 17, 2015
ENH: simplify field indexing of structured arrays
@charris charris merged commit f4e0bdd into numpy:master Jun 17, 2015
@charris
Copy link
Member

charris commented Jun 17, 2015

@ahaldane Merged, thanks. If the behavior has changed, it would be good to document it, My understanding is that this mostly simplifies the code and fixes some bugs.

@seberg
Copy link
Member

seberg commented Jun 18, 2015

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Record access on non-existing fields: no error issued
3 participants