Skip to content

np.diff is broken for boolean ndarrays in 1.13.0 #9251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
soupault opened this issue Jun 14, 2017 · 29 comments
Closed

np.diff is broken for boolean ndarrays in 1.13.0 #9251

soupault opened this issue Jun 14, 2017 · 29 comments

Comments

@soupault
Copy link
Contributor

soupault commented Jun 14, 2017

Should modify the docstring accordingly, raise a meaningful error, or (preferably) make it work by hidding some logic under the hood.

Traceback:

In [11]: x = np.array([[True, True], [False, False]])

In [12]: np.diff(x)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-59225436601c> in <module>()
----> 1 np.diff(x)

/usr/lib/python3.6/site-packages/numpy/lib/function_base.py in diff(a, n, axis)
   1924         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1925     else:
-> 1926         return a[slice1]-a[slice2]
   1927 
   1928 

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

In [13]: np.__version__
Out[13]: '1.13.0'

xref scikit-image/scikit-image#2681

@jaimefrio
Copy link
Member

This seems like an easy fix. The diff function is pure Python and lives here:

https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L1835

You would probably want to branch on a.dtype == bool, and given the note regarding booleans in the docstring, use ^ (or !=) instead of - to keep backwards compatibility.

@eric-wieser
Copy link
Member

It seems to me that if - is deprecated, then diff is perfectly reasonable to disallow too.

@matthew-brett
Copy link
Contributor

Well - I can see the purity argument for disallowing, but it seems to me that it is reasonable to expect diff to work on booleans, and not surprising that scikit image was doing that. So I would have thought a branch on bool was the better route.

@soupault
Copy link
Contributor Author

I'd vote for keeping the functionality. If we all agree, I'll submit a patch later today or tomorrow.

P.S. Could you, please, point me at the discussion where the rationale for deprecating -= was given?

@jaimefrio
Copy link
Member

If diff is deprecated on booleans, then the docstring needs updating to reflect that. And we'd probably still want to branch on bool to raise a more in context error?

@eric-wieser
Copy link
Member

Seems to me that we might want an np.uint1 type that behaves as a integer and not a boolean, on which np.diff and - can be defined.

@matthew-brett
Copy link
Contributor

My vote would be to avoid the user having to stop and debug an error on np.diff(some_bools); first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 14, 2017

first - it breaks backward compatibility, and second, the desired behavior is useful and clearly defined.

Both of these arguments apply to np.subtract too, don't they? Perhaps that discussion should be revisited

@matthew-brett
Copy link
Contributor

Yes, they do, and if you think of instinctively think of diff as implementing subtract then you'd probably want an error. My belief is that people think of diff in a slightly different way from subtract, so that they would agree that subtracting bools does not make sense, whereas taking a diff of bools does - meaning 'is this value different from the previous one'.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 14, 2017

"is this value different from the previous one" sounds like an operation worthy of it's own name that generalizes to all dtypes, rather than spelling it np.diff(x).astype(bool).

Perhaps ufunc.pairwise(x, axis=...), implemented as ufunc(x[:-1], x[1:]) ignoring the axis argument? That would make the desired function np.not_equal.pairwise.

Although pairwise might imply the behaviour that outer already has...

@matthew-brett
Copy link
Contributor

Sure, that might be a good idea as well - but I would still argue that it would be kinder (practicality beats purity) to forgive this way of thinking of bools and np.diff.

@charris charris added this to the 1.13.1 release milestone Jun 14, 2017
@charris
Copy link
Member

charris commented Jun 14, 2017

The subtraction operator for booleans has been deprecated for some time, since 1.9 I believe, scikit image hasn't been paying attention. Paying attention is also a good idea ;) I think a different name would be appropriate, xdiff or some such.

@charris
Copy link
Member

charris commented Jun 14, 2017

That said, having diff handle booleans for backwards compatibility is probably a good idea.

@matthew-brett
Copy link
Contributor

It's a bit unfair to blame scikit image here - I can't see any deprecation warning for numpy 1.12.1 and:

In [4]: np.diff([True, False])
Out[4]: array([ True], dtype=bool)

@eric-wieser
Copy link
Member

can't see any deprecation warning for numpy 1.12.1 and

That's because by default DeprecationWarning is not displayed. Add a import warnings; warnings.simplefilter('always') and you'll see it

@charris
Copy link
Member

charris commented Jun 14, 2017

In [1]: import warnings

In [2]: warnings.simplefilter('always')

In [3]: np.diff([True, False])
/home/charris/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1175: DeprecationWarning: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
  return a[slice1]-a[slice2]
Out[3]: array([ True], dtype=bool)

In [4]: np.__version__
Out[4]: '1.10.4'

@jaimefrio
Copy link
Member

One other thing we should do is add a test for diff using a boolean array, be it to assert_raises or to validate the result.

Since the solution seems to be controversial (I think I'm with Matthew on this one, although I don't care much either way), I have sent an e-mail to the list so that everyone gets a say.

@mhvk
Copy link
Contributor

mhvk commented Jun 14, 2017

I'm also of the practical beats purity type here, especially since this used to work. Somehow, like @matthew-brett, there is a "diff" is "difference/differs" confusion in my mind, and I certainly have used np.diff as a quick way to find places where values changed.

p.s. @jaimefrio - I didn't see any message on the (numpy discussion) mailing list.

@rgommers
Copy link
Member

+1 for practicality.

And definitely -1 for a new function. If it's useful, then put it in diff. The namespace is way too large already.

@jaimefrio
Copy link
Member

Yes, I noticed I couldn't find my mail in the archives to link it here. But there's an email in my sent folder to numpy-discussion@scipy.org at 18:00 CET. Perhaps the server is acting up again? Or were those issues fixed for good already?

@charris
Copy link
Member

charris commented Jun 15, 2017

Wrong address, it has moved ;) numpy-discussion@python.org .

@jaimefrio
Copy link
Member

Ah, that would explain it...

Here's a link to the mail, resent to the correct address, for future reference:

https://mail.python.org/pipermail/numpy-discussion/2017-June/076877.html

@soupault
Copy link
Contributor Author

This doesn't look good at all - scipy/scipy#7493 .

@charris
Copy link
Member

charris commented Jun 16, 2017

Someone want to make a PR adding support for boolean arrays to diff?

@soupault
Copy link
Contributor Author

@charris I do. Are we all agreed on this?

@charris
Copy link
Member

charris commented Jun 16, 2017

@soupault I think that is the consensus.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 13, 2017

This also applies to np.gradient. Do we fix that too? (could fold into #9411)

@soupault
Copy link
Contributor Author

Kind ping on @eric-wieser 's question. Should we fix np.gradient as well?

@mhvk
Copy link
Contributor

mhvk commented Feb 26, 2018

Personally, I don't see much point in supporting gradients of bool arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants