-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Allow dtype objects to be indexed with multiple fields at once #10417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks good in itself, but I wonder about indexing based on a list -- it seems to be much clearer to allow |
{ | ||
int seqlen, i; | ||
PyObject *fields; | ||
PyObject *names; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize these to NULL so XDECREF doesn't fail in the first few goto fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
Note that then |
Otherwise code looks good. Now for the hard questions: Will it be confusing if ndarray multifield-indexing removes padding, but dtype-multifield indexing keeps it? I kind of think so, in which case I think both should keep padding. Is it a good idea to add more common complex behavior to dtypes, given that we eventually want to rewrite dtypes so that each dtype has its own separate code? Eg, only structured dtypes need to know about multifield indexing, it is not needed for I agree that if we add this, |
I retract my earlier comment about that elsewhere, and now agree with you. Being consistent is most useful here.
Yes, because this is already guarded away inside the
Are you ok with the exception type changes? Perhaps I should submit a PR to adjust those in |
On the change to KeyError, see #5636 (comment) and discussion there. I'm for it, but it is a real backcompat issue. |
Perhaps we can bundle the KeyError change with the copy->view change if we delay both until 1.15. That way code-breakage is only for one release. |
I'd hope that anybody who needs a single-item view would write |
74532c5
to
b596fb0
Compare
Updated with |
2884320
to
23db49c
Compare
numpy/core/tests/test_multiarray.py
Outdated
@@ -1157,9 +1157,11 @@ def test_structuredscalar_indexing(self): | |||
def test_multiindex_titles(self): | |||
a = np.zeros(4, dtype=[(('a', 'b'), 'i'), ('c', 'i'), ('d', 'i')]) | |||
assert_raises(KeyError, lambda : a[['a','c']]) | |||
assert_raises(KeyError, lambda : a[['b','b']]) | |||
assert_raises(KeyError, lambda : a[['a','a']]) | |||
assert_raises(ValueError, lambda : a[['b','b']]) # field exists, but repeated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a new error as of 1.14, so we can change it in 1.14.1 without compatibility concerns.
Edit: see #10430 - will rebase this change on top of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the second commit, it becomes clearer why this is a nice thing to do! I'd still prefer a tuple, but maybe that is best treated as a separate issue -- for compatibility, we'd need to accept a list anyway.
numpy/core/src/multiarray/mapping.c
Outdated
int seqlen, i; | ||
PyObject *name = NULL, *tup; | ||
PyObject *fields, *names; | ||
PyObject *list_ind; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors are because you end up not using list_ind
23db49c
to
040af92
Compare
} | ||
} | ||
|
||
view_dtype = PyArray_DescrNewFromType(NPY_VOID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a later commit, but: This should probably just copy the original dtype.
numpy/core/src/multiarray/mapping.c
Outdated
Py_DECREF(names); | ||
return 0; | ||
/* Convert sequences into a list, which is what dtype.__getitem__ | ||
expects. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm arbitrarily deciding that descr.__getitem__
is going to be strict about the sequence type it gets passed, since it has no compatibility concerns. However, we still need to be liberal in what we accept here in array.__getitem__
.
d2e564b
to
1d7e45f
Compare
Oh heck:
That refcount is |
1d7e45f
to
b3c2116
Compare
Gave up on the forcing exact lists - it didn't actually help all that much with the refcounting |
@eric-wieser - I'm confused by your above comment. Is this PR OK, or are there still problems? |
There were problems, so I adjusted the interface of the internal function to accept any sequence. The remark about handing |
@eric-wieser would you like some help finishing this? |
Help reminding me that it exists is enough for now - I'll assign it to myself to make it easier to rediscover. |
ping. this still has an approved status but seems to need work |
I tried the "re-request" review button, but that did not remove my approval. In any case, I think there was little left... |
b3c2116
to
33849c1
Compare
Release note and test of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a test for the single "title" case, but the PR does not touch that code, so that does not matter.
decref name if an error occurs further on. */ | ||
if (PyTuple_SetItem(names, i, name) < 0) { | ||
goto fail; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTuple_SET_ITEM
is fine (and you can remove the error check).
This returns a dtype of the same size with other fields removed, just like that used in numpygh-6053 It should be possible to fall back on this implementation in mapping.c:_get_field_view
597cc36
to
90f710b
Compare
close/reopen |
Used that "resolve" button, seems somewhat annoying with that marge commit though, ah well, may clean it up tomorrow (or so, or just merge it anyway...). |
Bitten by the problems described in #13707 again |
@seberg, happy to merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I am about to merge. But before I do: We do not think it would be plausible for this indexing to "squeeze" the fields (remove padding), right? (I mean the dtype one, not the way it is used in indexing of course).
Just want to be sure that there is either little chance of that, or we do not care about changing it later on.
If you feel there is not. Feel free to merge yourself Eric.
numpy/core/src/multiarray/mapping.c
Outdated
/* only happens for strange sequence objects */ | ||
npy_bool is_string; | ||
PyObject *item = PySequence_GetItem(ind, i); | ||
if(item == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(item == NULL) { | |
if (item == NULL) { |
Just nitpicking for no reason.
No, I think the value comes from this meaning exactly the same thing as it does in indexing arrays, and the change to make array indexing not squeeze fields was quite deliberate. |
OK, I tend to agree, but wasn't into that discussion too deeply. I will just squash and merge this then. |
This returns a dtype of the same size with other fields removed, just like that used in gh-6053
It should be possible to fall back on this implementation in mapping.c:_get_field_view - would appreciate comments on that.
Some ways in which this differs from
array.__getitem__
:list
s are allowed, which avoids having to deal with malformed sequence objectsKeyError
is raised for invalid field names instead ofValueError
(relates to Incorrect Exception when indexing array with field. #8519)