-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Adding isin function for multidimensional arrays #8423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I was going to submit this, too, but just using the existing function:
|
numpy/lib/arraysetops.py
Outdated
|
||
Returns a boolean array the same length as `ar1` that is True | ||
Returns a boolean array the same shape as `ar1` that is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Returns a boolean array of the same shape"
numpy/lib/arraysetops.py
Outdated
>>> test = np.array([0, 1, 2, 5, 0]) | ||
>>> states = [0, 2] | ||
>>> mask = np.in1d(test, states) | ||
>>> test = np.array([0, 2, 4, 6]).reshape([2, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
test = np.array([[0, 2], [4, 6]])
?
Could probably do with better parameter names than |
Agreed, but not sure what the best names are. Conceptually, |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that another possible name for this function is contains
, copying the name from operator.contains
. That has the arguments backwards, however, which would entail flipping the arguments here, too.
Maybe isin(query, index)
for the argument names?
numpy/lib/arraysetops.py
Outdated
Default is False. ``np.in1d(a, b, invert=True)`` is equivalent | ||
to (but is faster than) ``np.invert(in1d(a, b))``. | ||
Default is False. ``np.isin(a, b, invert=True)`` is equivalent | ||
to (but is faster than) ``np.invert(isin(a, b))``. | ||
|
||
.. versionadded:: 1.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This versionadded is no longer relevant.
numpy/lib/arraysetops.py
Outdated
python keyword `in`, for 1-D sequences. ``in1d(a, b)`` is roughly | ||
equivalent to ``np.array([item in b for item in a])``. | ||
`isin` can be considered as an element-wise function version of the | ||
python keyword `in`. ``in1d(ar1, ar2)`` is roughly equivalent to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be isin
, not in1d
numpy/lib/arraysetops.py
Outdated
equivalent to ``np.array([item in b for item in a])``. | ||
`isin` can be considered as an element-wise function version of the | ||
python keyword `in`. ``in1d(ar1, ar2)`` is roughly equivalent to | ||
``in1d = np.vectorize(lambda x: x in ar2); in1d(ar1)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the original example, which I think is a little clearer than using np.vectorize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original example is not valid for multidimensional arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but you can still use it to describe how this function works on 1D arrays.
numpy/lib/arraysetops.py
Outdated
@@ -380,16 +381,16 @@ def setxor1d(ar1, ar2, assume_unique=False): | |||
flag2 = flag[1:] == flag[:-1] | |||
return aux[flag2] | |||
|
|||
def in1d(ar1, ar2, assume_unique=False, invert=False): | |||
def isin(ar1, ar2, assume_unique=False, invert=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely consider more meaningful names here.
Went with |
To me, that would mean one array is a sub-array of the other, like https://docs.python.org/2/reference/expressions.html#membership-test-operations calls the arguments |
I'd argue the name should still be |
Compare:
versus
|
I'm assuming that you mean Your second case is not really any more inconsistent than:
I think that whatever we name it, differences in behaviour between |
I guess there are enough differences here that |
|
@seberg: Is the behaviour documented anywhere? The first three google results simply tell me |
Reading the code of it is the easiest. Basically, it works as long as the lhs is a single element. If it is not a single element, things likely won't make much sense. |
numpy/lib/arraysetops.py
Outdated
Returns a boolean array of the same length as `elements` that is True | ||
where an element of `elements` is in `test_elements` and False otherwise. | ||
|
||
See `numpy.isin` for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous about deleting all the parameters on this method, including the notes on when they were added. Can we keep them around, even though it's slightly redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close -- would love your help to get this across the finish line!
This should also be mentioned in the docs (doc/source/reference/routines.set.rst
) and the release notes (doc/release/1.13.0-notes.rst
).
numpy/lib/arraysetops.py
Outdated
@@ -380,106 +381,130 @@ def setxor1d(ar1, ar2, assume_unique=False): | |||
flag2 = flag[1:] == flag[:-1] | |||
return aux[flag2] | |||
|
|||
def in1d(ar1, ar2, assume_unique=False, invert=False): | |||
def isin(elements, test_elements, assume_unique=False, invert=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not worth the hassle of changing argument names on an effectively deprecated function and potentially breaking some users. Let's only use the new arguments names for isin
.
numpy/lib/arraysetops.py
Outdated
ar2 : array_like | ||
The values against which to test each value of `ar1`. | ||
test_elements : array_like | ||
The values against which to test each value of `elements`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this argument is flattened.
numpy/lib/arraysetops.py
Outdated
in1d : (M,) ndarray, bool | ||
The values `ar1[in1d]` are in `ar2`. | ||
isin : ndarray, bool | ||
The values `elements[isin]` are in `test_elements`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this has the same shape as elements
.
numpy/lib/arraysetops.py
Outdated
# Ravel both arrays, behavior for the first array could be different | ||
ar1 = np.asarray(ar1).ravel() | ||
ar2 = np.asarray(ar2).ravel() | ||
# Ravel both arrays, will reshape result later to match elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though isin
should replace most uses of in1d
, I think this code is actually easier to understand when a separate helper function is used for the 1D functionality, in the style of unique
/_unique1d
in this file. So I would keep the core logic in in1d
.
So it sounds like we want to leave |
@brsr You're certainly welcome to do any internal clean-up that is warranted, but from a public API perspective I would indeed keep |
I believe I've caught all the instances of |
☔ The latest upstream changes (presumably #8446) made this pull request unmergeable. Please resolve the merge conflicts. |
@eric-wieser, I'm not following your reasoning. Using the example from the docs, |
Right, but in numpy by default everything is plural, so that's not really helpful. For instance,
It makes the most sense to document this as "calculates So for
Again, we have a base operation, in parentheses, and then a bunch of numpy broadcasting happening. Let's be consistent here - we should always describe the function in terms of its lowest-dimensional generalization. So it makes the most sense to document this as "calculates |
Also, we already bikeshedded the name, but... The more I look at this PR, the harder I find it not to read I know we've picked it for pandas compatibility, but as it's not a method of How do people feel about |
I can see your point, but it looks very stilted to me, so I still prefer |
numpy/lib/arraysetops.py
Outdated
@@ -516,7 +516,7 @@ def isin(elements, test_elements, assume_unique=False, invert=False): | |||
Returns | |||
------- | |||
isin : ndarray, bool | |||
Has the same shape as `elements`. The values `elements[isin]` | |||
Has the same shape as `element`. The values `elements[isin]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed an s
here
numpy/lib/arraysetops.py
Outdated
|
||
If `test_elements` is a set (or other non-sequence collection) it will | ||
be converted to an object array with one element, rather than an array | ||
of the values contained in `test_elements`. Converting the set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth pointing out that this is just the np.array
constructor doing its thing, and is not something weird this function chose to do. Mentioning "array
constructor" somewhere should do the job
numpy/lib/arraysetops.py
Outdated
elements = np.array(elements) | ||
return in1d(elements, test_elements, assume_unique=assume_unique, | ||
invert=invert).reshape(elements.shape) | ||
element = np.array(element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be asarray
, or asanyarray
if you think this will worked for masked arrays
numpy/ma/extras.py
Outdated
elements = ma.array(elements) | ||
return in1d(elements, test_elements, assume_unique=assume_unique, | ||
invert=invert).reshape(elements.shape) | ||
element = ma.array(element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, asarray
or asanyarray
numpy/ma/tests/test_extras.py
Outdated
ec[0, 0, 1] = True | ||
ec[0, 2, 3] = True | ||
c = isin(a, b) | ||
assert_array_equal(c, ec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test here for type(c)
? Should it be MaskedArray
or array
?
Also, a test of what happens if you call np.isin
instead of np.ma.isin
might be interesting as well.
numpy/lib/tests/test_arraysetops.py
Outdated
ec[1, 2, 2] = True | ||
c = isin(a, b) | ||
assert_array_equal(c, ec) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a test for 0d a
here
you might be able to simplify your tests with something like: isin_slow = np.vectorize(lambda a, b: a in b, excluded={'b'})
# do this for a bunch of things, now that you don't need to find the result by hand
assert_equal(isin_slow(a, b), isin(a, b)) |
numpy/lib/tests/test_arraysetops.py
Outdated
#both args | ||
assert_isin_equal(5, 6) | ||
#zero-d array-like as... | ||
x = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
is 1d with shape (0,)
, not 0d with shape ()
.
I'm thinking testing things like np.isin(somearr, np.array(1))
, or np.isin(np.array(1), somearr)
numpy/ma/tests/test_extras.py
Outdated
a = array(a, mask=mask) | ||
values = [0, 10, 20, 30, 1, 3, 11, 22, 33] | ||
mask = [0, 1, 0, 1, 0, 1, 0, 1, 0] | ||
b = array(values, mask=mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even just
array(data=[0, 10, 20, 30, 1, 3, 11, 22, 33],
mask=[0, 1, 0, 1, 0, 1, 0, 1, 0])
Better already with the alignment
numpy/ma/extras.py
Outdated
def in1d(ar1, ar2, assume_unique=False, invert=False): | ||
""" | ||
Test whether each element of an array is also present in a second | ||
array. | ||
|
||
The output is always a masked array. See `numpy.in1d` for more details. | ||
|
||
This function has been deprecated: use `isin` instead. | ||
|
||
Deprecated since version 1.13.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the .. deprecated:: 1.13.0
syntax. Look around at other files to see how things are worded around it. Some files use .. note
as well, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, numpy/HOWTO_DOCUMENT.rst.txt says something different about deprecation warnings.
numpy/lib/arraysetops.py
Outdated
def in1d(ar1, ar2, assume_unique=False, invert=False): | ||
""" | ||
Test whether each element of a 1-D array is also present in a second array. | ||
|
||
Returns a boolean array the same length as `ar1` that is True | ||
where an element of `ar1` is in `ar2` and False otherwise. | ||
|
||
This function has been deprecated: use `isin` instead. | ||
|
||
Deprecated since version 1.13.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
numpy/lib/tests/test_arraysetops.py
Outdated
c = [(9, 8), (7, 6)] | ||
d = (9, 7) | ||
assert_isin_equal(c, d) | ||
#zero-d scalar as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are good, but tests for 0d arrays are also important, and usually catch more bugs — so it would be good to have those too.
Related questions:
- should
isin(1, 1)
:- return
np.array(True)
- return
np.bool_(True)
- raise TypeError, because
1
is not iterable before coercion to a 0d array
- return
- should
isin(1, np.array(1))
:- return
np.array(True)
- return
np.bool_(True)
- return
- etc
Also, I think the comments labelling which arg is a scalar are a bit much. This'd be more readable with just the section headings, and some space between ssections
@@ -411,6 +417,8 @@ def in1d(ar1, ar2, assume_unique=False, invert=False): | |||
|
|||
See Also | |||
-------- | |||
isin : Version of this function that preserves the | |||
shape of ar1. | |||
numpy.lib.arraysetops : Module with a number of other functions for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"See also: The module you're looking at right now" is a little weird. @brsr, you don't need to worry about it in this PR though.
numpy/lib/arraysetops.py
Outdated
can speed up the calculation. Default is False. | ||
invert : bool, optional | ||
If True, the values in the returned array are inverted (that is, | ||
False where an element of `ar1` is in `ar2` and True otherwise). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names are out of date. I'd probably ditch this wording all together, and explain it in terms of not in
.
numpy/lib/arraysetops.py
Outdated
array([[ True, False], | ||
[ False, True]], dtype=bool) | ||
>>> element[mask] | ||
array([0, 6])""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: needs an extra newline in this docstring
numpy/lib/arraysetops.py
Outdated
it will be converted to an object array with one element, rather than an | ||
array of the values contained in `test_elements`. This is a consequence of | ||
the `array` constructor's way of handling non-sequence collections. | ||
Converting the set to a list usually gives the desired behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good thing to show an example of, for people skipping the notes
numpy/lib/info.py
Outdated
@@ -147,6 +147,8 @@ | |||
setxor1d Set exclusive-or of 1D arrays with unique elements. | |||
in1d Test whether elements in a 1D array are also present in | |||
another array. | |||
isin Test whether elements in an array are also present in | |||
another array, preserving shape of first array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "test whether each element of one array is present anywhere within another"?
No strong feelings here
numpy/lib/info.py
Outdated
@@ -147,8 +147,8 @@ | |||
setxor1d Set exclusive-or of 1D arrays with unique elements. | |||
in1d Test whether elements in a 1D array are also present in | |||
another array. | |||
isin Test whether elements in an array are also present in | |||
another array, preserving shape of first array. | |||
isin Test whether each element of one (possibly multidimensional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just possibly? I think you can put ND
before both of these arrays, to make it explicit compared to its surrounding 1D
functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me now - nice job!
remove 1D from section description, since they're not all 1D. match order in the list at top of arraysetops.py, where all the ND functions are first and all the 1D ones are after.
☔ The latest upstream changes (presumably #8996) made this pull request unmergeable. Please resolve the merge conflicts. |
doc/release/1.13.0-notes.rst
Outdated
@@ -22,6 +22,8 @@ Dropped Support | |||
Deprecations | |||
============ | |||
|
|||
* ``in1d`` is now deprecated in favor of ``isin``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mark things as deprecated without also raising a DeprecationWarning
.
I think it would be best to not deprecate in1d
in this PR, so that merging this otherwise nice work can happen in time for 1.13.
You can always make a new PR that adds deprecation once this is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be my fault: I may have said something like to suggest that we should deprecate in1d, but I was being sloppy with my language. I don't think we can (or should) ever raise a DeprecationWarning. in1d
works just fine, albeit not with as nice an API as isin
, and is used all over the place, so there is absolutely no good reason to remove it.
What we can and should do is discourage using in1d
in favor of isin
for new code. Instead of saying in1d
is deprecated, its docstring should say something like: "We recommend using isin
instead of in1d
for new code".
Will need a rebase at some point, but if you're happy for all the commits to be squashed into one, then I can do that for you via the github UI |
@eric-wieser , squashing is fine with me. |
Thanks @brsr! |
This fixes numpygh-8331 Also update the docs for arraysetops to remove the outdated "1D" from the description, which was already incorrect for np.unique.
This resolves #8331. I had initially started the request there, and there was discussion on the mailing list, in which everyone seemed pretty happy with the idea. I hadn't realized that there was a separate in1d method for masked arrays, so this includes an isin function for masked arrays too.