-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: simplify field indexing of structured arrays #5636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
86607bb
to
6c73448
Compare
PyObject *obj; | ||
/* field access */ | ||
if (PyDataType_HASFIELDS(PyArray_DESCR(self)) && | ||
obj_is_string_or_stringlist(ind)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just this one note for the moment. This seems wrong, we do not support assignmnet to multiple fields at once. You could do this (but think it might be a bad idea, considering how bad multi field assignments are right now), if you give the python function an option to return a view already now. Otherwise we are basically assigning into dev NULL ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had intended for the following two examples to behave similarly.
>>> a = np.array([(1,2,3),(4,5,6)],
dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')])
>>> b = a[['f0', 'f2']]
>>> b[:] = (10,20)
>>> a = np.array([(1,2,3),(4,5,6)],
dtype=[('f0', 'i8'), ('f1', 'i8'), ('f2', 'i8')])
>>> a[['f0', 'f2']] = (10,20)
The first example was already possible - multi-field indexing returns a copy, and shows a warning if you try to write to it. With this PR, I wanted second example to also show a warning. (Both examples effectively write to dev/null).
I forgot to show the warning though. I will add it in array_assign_subscript
.
I don't think multi-field indexing should return a view (even as an option) yet. I tried doing so, but indeed, as you mention, I discovered that it would behave quite strangely. I think more extensive changes are needed before multi-field assignment is really allowed.
6c73448
to
c651a7d
Compare
All right, I added the warning. It is easy to switch back to forbidding multi-field subscript assignment if that's better. The advantage with the current code is that once multi-field indexing becomes feasible all that is needed is to remove the warning. A possible downside is the warning is shown on every assignment attempt (I didn't see a way to prevent this). |
/* warn if writing to a copy. copies will have no base */ | ||
if (PyArray_BASE(view) == NULL) { | ||
/* the warning will be shown on every attempt */ | ||
PyArray_ENABLEFLAGS((PyArrayObject*)view, NPY_ARRAY_WARN_ON_WRITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it just warns, it needs to raise an error. I do not think I would care about the future warning, if we want to change it, we can just change it. Your example will work fine also for the second case, that is not the thing, the first case is just also broken enough in my opinion, but then I don't like recarrays.
In any case, I am not sure the changes to this part give any gain at all to be honest, unless you want the multi index thing.
c651a7d
to
2a10597
Compare
@seberg, I did have multi-field assignment in mind, but since that isn't implemented it's true the changes to I'm fine with changing |
@charris Alignment issues hadn't occurred to me. Thanks, I'll keep it in mind. |
Throwing a new custom IndexError seems like a great change to me. Calling the python code is fine with me, the only thing I could disagree with would be speed reasons, and to be honest I doubt they are large, and I don't care about recarray speed ;). |
2a10597
to
9df52c6
Compare
Sounds good! I added another small change as a second commit, to fix #5631. |
@seberg Are you suggesting an |
Oh sorry, "custom" wasn't meant like that, no. I just wrote that because before it dropped through to the normal indexing for which fields don't make sense really. Should maybe mention a chane in error type anyway. |
8dfbe5a
to
c510365
Compare
rebased + tidied up. Here's the current summary:
|
@@ -287,23 +287,45 @@ def _newnames(datatype, order): | |||
return tuple(list(order) + nameslist) | |||
raise ValueError("unsupported order value: %s" % (order,)) | |||
|
|||
def isstring(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a conditional def instead of a condition inside the function. Do we actually support unicode fields on python 2? If not, I think you can just use str
, if we do, you can use six.string_types
instead.
I will have a better look tomorrow morning probably. Don't have time right now, and more annoying stuff happened also ;). |
c510365
to
56ae928
Compare
Modified I also added a provisional commit changing the |
56ae928
to
b3a77c6
Compare
view_dtype = {'names': names, 'formats': formats, | ||
'offsets': offsets, 'itemsize': dt.itemsize} | ||
|
||
#return copy for now (future plan to return ary.view(dtype=view_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after # ;), next line also misses some spaces, but it isn't your line
Looks good. Some nitpicks and possibly tests for basic unicode access would be nice. But should be ready modulo that unicode thing (which is likely just be me confused). (or modulo some veto about the KeyError thing). |
4226bf4
to
3459ad1
Compare
I went ahead and moved all imports in I'm not sure why a previous PR's commit is getting lumped in here.. maybe I messed up a rebase somehow? Once the ValueError->KeyError change can go forward, I'll squash everything. |
I think it would be better to save the change |
all right, will do within the next day or two |
If you rebase on master the extra PR will go away. |
3459ad1
to
f5c6ae1
Compare
This commit simplifies the code in array_subscript and array_assign_subscript related to field access. This fixes numpy#4806, and also removes a potential segfaults, eg if the array is indexed using an sequence-like object that raises an exception in getitem. Also fixes numpy#5631, related to creation of structured dtypes with no fields (an unusual and probably useless edge case). Also moves all imports in _internal.py to the top. Fixes numpy#4806. Fixes numpy#5631.
f5c6ae1
to
3c1a13d
Compare
Undid KeyError changes, rebased, squashed, and added a docstring to |
ENH: simplify field indexing of structured arrays
@ahaldane Merged, thanks. If the behavior has changed, it would be good to document it, My understanding is that this mostly simplifies the code and fixes some bugs. |
Thanks all! |
This commit simplifies the code in
array_subscript
andarray_assign_subscript
related to field access. This fixes #4806, and also removes a potential segfaults, eg if the array is indexed using an sequence-like object that raises an exception in getitem.KeyError vs ValueError
One minor comment on field subscripts: Currently the following code raises a
ValueError
:However
KeyError
seems slightly better to me, and it's what my code "naturally" produced (but I corrected it since some tests expectedValueError
). Is it worth changing? (Probably not, for back-compatability).(Also note that multi-field indexing still returns a copy, though it had been planned to return a view in 1.10. I originally considered changing that here, but I left that for another time).