ENH: simplify field indexing of structured arrays #5636

ahaldane · 2015-03-06T04:43:30Z

This commit simplifies the code in array_subscript and array_assign_subscript related to field access. This fixes #4806, and also removes a potential segfaults, eg if the array is indexed using an sequence-like object that raises an exception in getitem.

KeyError vs ValueError

One minor comment on field subscripts: Currently the following code raises a ValueError:

>>> a = np.zeros((1,), dtype=[('f1', 'i4')]
>>> a['nofield']

However KeyError seems slightly better to me, and it's what my code "naturally" produced (but I corrected it since some tests expected ValueError). Is it worth changing? (Probably not, for back-compatability).

(Also note that multi-field indexing still returns a copy, though it had been planned to return a view in 1.10. I originally considered changing that here, but I left that for another time).

seberg · 2015-03-06T08:50:04Z

numpy/core/src/multiarray/mapping.c

-            PyObject *obj;
+    /* field access */
+    if (PyDataType_HASFIELDS(PyArray_DESCR(self)) &&
+            obj_is_string_or_stringlist(ind)) {


Just this one note for the moment. This seems wrong, we do not support assignmnet to multiple fields at once. You could do this (but think it might be a bad idea, considering how bad multi field assignments are right now), if you give the python function an option to return a view already now. Otherwise we are basically assigning into dev NULL ;).

Ah, I had intended for the following two examples to behave similarly.

>>> a = np.array([(1,2,3),(4,5,6)], dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')]) >>> b = a[['f0', 'f2']] >>> b[:] = (10,20) >>> a = np.array([(1,2,3),(4,5,6)], dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')]) >>> a[['f0', 'f2']] = (10,20)

The first example was already possible - multi-field indexing returns a copy, and shows a warning if you try to write to it. With this PR, I wanted second example to also show a warning. (Both examples effectively write to dev/null).

I forgot to show the warning though. I will add it in array_assign_subscript.

I don't think multi-field indexing should return a view (even as an option) yet. I tried doing so, but indeed, as you mention, I discovered that it would behave quite strangely. I think more extensive changes are needed before multi-field assignment is really allowed.

ahaldane · 2015-03-06T18:35:56Z

All right, I added the warning. It is easy to switch back to forbidding multi-field subscript assignment if that's better. The advantage with the current code is that once multi-field indexing becomes feasible all that is needed is to remove the warning. A possible downside is the warning is shown on every assignment attempt (I didn't see a way to prevent this).

seberg · 2015-03-06T22:55:19Z

numpy/core/src/multiarray/mapping.c

+        /* warn if writing to a copy. copies will have no base */
+        if (PyArray_BASE(view) == NULL) {
+            /* the warning will be shown on every attempt */
+            PyArray_ENABLEFLAGS((PyArrayObject*)view, NPY_ARRAY_WARN_ON_WRITE);


This sounds like it just warns, it needs to raise an error. I do not think I would care about the future warning, if we want to change it, we can just change it. Your example will work fine also for the second case, that is not the thing, the first case is just also broken enough in my opinion, but then I don't like recarrays.
In any case, I am not sure the changes to this part give any gain at all to be honest, unless you want the multi index thing.

charris · 2015-03-06T23:24:33Z

@ahaldane I did have a branch for taking views from structures, but a test failed due to changes in filler. I thought I made a post, but can't find it now. In any case, the result of the change was not completely back compatible, but the failures would probably be few.

Edit: mentioned in #3641

ahaldane · 2015-03-14T16:50:18Z

@seberg, I did have multi-field assignment in mind, but since that isn't implemented it's true the changes to array_assign_subscript don't do much.

I'm fine with changing array_assign_subscript back, but let me try one other variation first: I now raise an error on multi-field assignment attempt. The difference is the error message now says "multi-field assignment is not supported" but it used to say "only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices", and raises ValueError instead of IndexError.

ahaldane · 2015-03-14T16:51:03Z

@charris Alignment issues hadn't occurred to me. Thanks, I'll keep it in mind.

seberg · 2015-03-14T18:45:30Z

Throwing a new custom IndexError seems like a great change to me. Calling the python code is fine with me, the only thing I could disagree with would be speed reasons, and to be honest I doubt they are large, and I don't care about recarray speed ;).
.

ahaldane · 2015-03-14T19:31:21Z

Sounds good!

I added another small change as a second commit, to fix #5631.

charris · 2015-03-14T19:36:54Z

@seberg Are you suggesting an IndexError be added to numpy/__init__.py? If so it should also be mentioned in the release notes.

seberg · 2015-03-15T18:16:29Z

Oh sorry, "custom" wasn't meant like that, no. I just wrote that because before it dropped through to the normal indexing for which fields don't make sense really. Should maybe mention a chane in error type anyway.

charris · 2015-06-12T05:19:45Z

@ahaldane Needs rebase. @seberg comments?

ahaldane · 2015-06-12T06:37:11Z

rebased + tidied up.

Here's the current summary:

Tweaked _index_fields, so that now it handles individual field indices, and also does more careful sanity checks (fixing Record access on non-existing fields: no error issued #4806). It is a small step closer to allowing multi-field assignment, if that will ever happen.
Simplified array_subscript and array_assign_subscript: It now leaves most of the work to _index_fields, and raises a more descriptive error on attempts to do multi-field assignment. The check for field indexing is now a separate function obj_is_string_or_stringlist, which fixes a potential segfault present before. I also just updated it to use the new npy_cache_pyfunc.
Bugfix in VOID_setitem, to fix Cryptic SystemError when creating array with weird structured-but-empty dtype #5631. (fixes the n=0 case in that code)

seberg · 2015-06-12T07:48:57Z

numpy/core/_internal.py

@@ -287,23 +287,45 @@ def _newnames(datatype, order):
        return tuple(list(order) + nameslist)
    raise ValueError("unsupported order value: %s" % (order,))

+def isstring(obj):


I would prefer a conditional def instead of a condition inside the function. Do we actually support unicode fields on python 2? If not, I think you can just use str, if we do, you can use six.string_types instead.

seberg · 2015-06-12T09:21:22Z

I will have a better look tomorrow morning probably. Don't have time right now, and more annoying stuff happened also ;).

ahaldane · 2015-06-12T17:05:04Z

Modified isstring as suggested. And while in python2 fields cannot be unicode, field titles can be, so I do have to take care of unicode.

I also added a provisional commit changing the ValueError to a KeyError for invalid field access. It required tweaking 4 unit tests plus a line in lib/recfunctions.py. I would definitely prefer it to be a KeyError, but a github search did turn up one case expecting ValueError , here. Any opinions?

seberg · 2015-06-13T10:27:33Z

numpy/core/_internal.py

+    view_dtype = {'names': names, 'formats': formats,
+                  'offsets': offsets, 'itemsize': dt.itemsize}
+
+    #return copy for now (future plan to return ary.view(dtype=view_dtype))


Space after # ;), next line also misses some spaces, but it isn't your line

seberg · 2015-06-13T10:32:55Z

Looks good. Some nitpicks and possibly tests for basic unicode access would be nice. But should be ready modulo that unicode thing (which is likely just be me confused). (or modulo some veto about the KeyError thing).

ahaldane · 2015-06-13T17:20:51Z

I went ahead and moved all imports in _internal.py to the top. Otherwise fixed most little problems, and there are already lots of tests for unicode (which helped me catch unicode bugs here).

I'm not sure why a previous PR's commit is getting lumped in here.. maybe I messed up a rebase somehow?

Once the ValueError->KeyError change can go forward, I'll squash everything.

charris · 2015-06-15T20:25:23Z

I think it would be better to save the change ValueError->KeyError for later. The second is preferable, but the first is OK and I suspect we are going to have enough campatibility problems with 1.10 without introducing one we are already aware of.

seberg · 2015-06-16T07:06:13Z

@charris ok, @ahaldane sorry for suggesting to go ahead with it. Could you separate the two things? Probably we can give it a shot and merge it into master when 1.10 is released/branched off. Also gives it a lot of time for complains to start before an actual release.

ahaldane · 2015-06-16T15:33:21Z

all right, will do within the next day or two

charris · 2015-06-16T18:24:08Z

If you rebase on master the extra PR will go away.

This commit simplifies the code in array_subscript and array_assign_subscript related to field access. This fixes numpy#4806, and also removes a potential segfaults, eg if the array is indexed using an sequence-like object that raises an exception in getitem. Also fixes numpy#5631, related to creation of structured dtypes with no fields (an unusual and probably useless edge case). Also moves all imports in _internal.py to the top. Fixes numpy#4806. Fixes numpy#5631.

ahaldane · 2015-06-17T19:03:04Z

Undid KeyError changes, rebased, squashed, and added a docstring to _index_fields.

ENH: simplify field indexing of structured arrays

charris · 2015-06-17T19:31:41Z

@ahaldane Merged, thanks. If the behavior has changed, it would be good to document it, My understanding is that this mostly simplifies the code and fixes some bugs.

seberg · 2015-06-18T11:12:09Z

Thanks all!

ahaldane force-pushed the rework_index_fields branch from 86607bb to 6c73448 Compare March 6, 2015 04:55

seberg reviewed Mar 6, 2015
View reviewed changes

ahaldane force-pushed the rework_index_fields branch from 6c73448 to c651a7d Compare March 6, 2015 18:07

seberg reviewed Mar 6, 2015
View reviewed changes

charris added 01 - Enhancement component: numpy._core labels Mar 8, 2015

ahaldane force-pushed the rework_index_fields branch from c651a7d to 2a10597 Compare March 14, 2015 16:48

ahaldane mentioned this pull request Mar 14, 2015

When accessing multiple fields of a structured array, existence is not checked and return dtype may be empty #5632

Closed

ahaldane force-pushed the rework_index_fields branch from 2a10597 to 9df52c6 Compare March 14, 2015 19:29

ahaldane mentioned this pull request Jun 11, 2015

BUG subclass with __numpy_ufunc__ segfaults with recarray comparison #4855

Closed

charris added this to the 1.10.0 release milestone Jun 12, 2015

ahaldane force-pushed the rework_index_fields branch 2 times, most recently from 8dfbe5a to c510365 Compare June 12, 2015 06:29

seberg reviewed Jun 12, 2015
View reviewed changes

ahaldane force-pushed the rework_index_fields branch from c510365 to 56ae928 Compare June 12, 2015 16:55

ahaldane force-pushed the rework_index_fields branch from 56ae928 to b3a77c6 Compare June 12, 2015 17:12

seberg reviewed Jun 13, 2015
View reviewed changes

ahaldane force-pushed the rework_index_fields branch 2 times, most recently from 4226bf4 to 3459ad1 Compare June 13, 2015 17:17

ahaldane force-pushed the rework_index_fields branch from 3459ad1 to f5c6ae1 Compare June 17, 2015 17:30

ahaldane force-pushed the rework_index_fields branch from f5c6ae1 to 3c1a13d Compare June 17, 2015 17:51

charris added a commit that referenced this pull request Jun 17, 2015

Merge pull request #5636 from ahaldane/rework_index_fields

f4e0bdd

ENH: simplify field indexing of structured arrays

charris merged commit f4e0bdd into numpy:master Jun 17, 2015

ahaldane mentioned this pull request Jun 17, 2015

MAINT: document change to bytestring index behavior #5977

Merged

ahaldane mentioned this pull request Jul 28, 2015

structured arrays should raise Exception if missing fields are requested in the form of a list #5262

Closed

seberg mentioned this pull request Oct 13, 2015

performance regression for record array access in numpy 1.10.1 #6467

Closed

ahaldane mentioned this pull request May 15, 2016

accessing undefined field raises incorrect Exception #7641

Closed

ahaldane mentioned this pull request Jan 22, 2017

Incorrect Exception when indexing array with field. #8519

Open

This was referenced Jan 17, 2018

ENH: Allow dtype objects to be indexed with multiple fields at once #10417

Merged

MAINT: struct assignment "by field position", multi-field indices return views #6053

Merged

Uh oh!

ENH: simplify field indexing of structured arrays #5636

ENH: simplify field indexing of structured arrays #5636

Uh oh!

Conversation

ahaldane commented Mar 6, 2015

KeyError vs ValueError

Uh oh!

seberg Mar 6, 2015

Choose a reason for hiding this comment

Uh oh!

ahaldane Mar 6, 2015

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Mar 6, 2015

Uh oh!

seberg Mar 6, 2015

Choose a reason for hiding this comment

Uh oh!

charris commented Mar 6, 2015

Uh oh!

ahaldane commented Mar 14, 2015

Uh oh!

ahaldane commented Mar 14, 2015

Uh oh!

seberg commented Mar 14, 2015

Uh oh!

ahaldane commented Mar 14, 2015

Uh oh!

charris commented Mar 14, 2015

Uh oh!

seberg commented Mar 15, 2015

Uh oh!

charris commented Jun 12, 2015

Uh oh!

ahaldane commented Jun 12, 2015

Uh oh!

seberg Jun 12, 2015

Choose a reason for hiding this comment

Uh oh!

seberg commented Jun 12, 2015

Uh oh!

ahaldane commented Jun 12, 2015

Uh oh!

seberg Jun 13, 2015

Choose a reason for hiding this comment

Uh oh!

seberg commented Jun 13, 2015

Uh oh!

ahaldane commented Jun 13, 2015

Uh oh!

charris commented Jun 15, 2015

Uh oh!

seberg commented Jun 16, 2015

Uh oh!

ahaldane commented Jun 16, 2015

Uh oh!

charris commented Jun 16, 2015

Uh oh!

ahaldane commented Jun 17, 2015

Uh oh!

charris commented Jun 17, 2015

Uh oh!

seberg commented Jun 18, 2015

Uh oh!

Uh oh!