BUG: ptp fails on bool #15057

bashtage · 2019-12-05T13:45:03Z

Reproducing code example:

import numpy as np
rg = np.random.default_rng()
np.ptp(rg.integers(0,2,size=100).astype("bool"))

Error message:

TypeError: numpy boolean subtract, the - operator, is deprecated, use the bitwise_xor, the ^ operator, or the logical_xor function instead.

Numpy/Python version information:

import sys, numpy; print(numpy.__version__, sys.version)
1.17.4 3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)]

The text was updated successfully, but these errors were encountered:

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

eric-wieser · 2019-12-05T13:52:39Z

Would you expect the result to be a float or bool?

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

bashtage · 2019-12-05T13:54:47Z

I'm happy either way as long as I can test using ptp(x, axis=0) == 0. Probably a bool, and so True/False. I assume ptp(x.astype(np.float32), axis=0) is a float32.

bashtage · 2019-12-05T13:57:52Z

Type seems to be preserved by ptp except where it spills over (int types).

eric-wieser · 2019-12-05T14:02:22Z

I'd argue that the current behavior is correct - ptp is not defined over things on which subtraction isn't defined.

Having said that, the operation all_values_equal(x, axis=...), which you implement as ptp(x, axis=....) == 0 is well-defined on booleans, it's just your implementation that isn't.

I'd argue that ptp(x) == 0 is a bad way to implement this anyway - why not max(x) == min(x), which is fewer comparisons?

WarrenWeckesser · 2019-12-05T15:18:14Z

FWIW: When I was testing various ways to check that an array is constant earlier this year, the fastest turned out to be (x == x[0]).all(). In my case, x was a 1-d aray and I knew the length was not 0. To check for constant values along the first dimension of an n-d array, you could use (x == x[0]).all(axis=0).

eric-wieser · 2019-12-05T15:26:05Z

Perhaps it would be worth adding to numpy somewhere (or perhaps a C-optimized version of the following):

def all_same(arr, *, axis):
    axis = normalize_axis_tuple(axis, arr.ndim)
    sl = [slice(None)]*arr.ndim
    for ax in axis:
         sl[axis] = slice(None, 1)
    return (x == x[tuple(sl)]).all(axis=axis)

Note by using [:1] instead of [0] it works for size 0 arrays too

bashtage · 2019-12-05T15:27:57Z

@eric-wieser The original justification for ptp was that it should be about 2x as fast as using max and min. I think the implementation in NumPy isn't and is actually doing two separate passes.

@WarrenWeckesser The potential issue there is that it requires creating an intermediate array that has the same dimension as x. If x is large, then this array will also be large. The ptp method uses a reduction first, and then does creates an intermediate with the same size as the reduction.

eric-wieser · 2019-12-05T15:30:29Z

The original justification for ptp was that it should be about 2x as fast as using max and min. I think the implementation in NumPy isn't and is actually doing two separate passes.

ptp is implemented as max() - min() inside numpy

that it requires creating an intermediate array

That's a fair point - although at least the array is a boolean so typically 4-8x smaller than x

bashtage · 2019-12-05T15:33:48Z

For me, the reduction dimension is negligible relative to the shape of the axis being reduced (1000-10000000x smaller), so it is pretty negligible.

A loopy implementation of all_same would avoid intermediates.

bashtage · 2019-12-05T15:45:31Z

In terms of thinking about the choice here, it is hard to understand by why some ops, e.g., sum, happily cast bools to integers, while other, e.g., ptp, do not.

Put another way, I don't really understand how + is well defined by - is not

bashtage · 2019-12-05T15:55:23Z

This seems to be a rabbit hole where it is hard to be fully consistent. For example

>>> x = np.bool_(True)
>>> x + x
True
>>> np.array([x, x]).sum()
2
>>> np.diff(np.array([x, x]))
array([False])

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

WarrenWeckesser · 2019-12-05T17:03:23Z

I don't know the details, but there is a lot of relevant history here, including some churn back in version 1.13 when the deprecation of boolean subtraction that was added in 1.9 (I think) was converted to an error (e.g. #9251, #9255); see the links below. Digging into that history is probably necessary to explain how we ended with the current behavior.

Can someone familiar with the details and history comment here? Is there more to do, or is the current behavior the best compromise at the moment (and therefore this issue should be closed)?

Edit:

Here's a relevant mailing list thread from December 2013: https://mail.python.org/pipermail/numpy-discussion/2013-December/068375.html
@seberg's follow-up in January 2014: https://mail.python.org/pipermail/numpy-discussion/2014-January/068751.html
The pull request to deprecate boolean subtraction: DEP: Deprecate boolean math operations #4105

bashtage · 2019-12-05T17:24:03Z

After this thread, I went with something like

def all_same(arr, *, axis):
    _max = arr.max(axis=axis)
    _min = arr.min(axis=axis)
    return _max == _min

which is what was originally suggested. Seems to work correctly for nan as well.

seberg · 2019-12-06T21:16:43Z

@bashtage nice. Sounds like we can close this then. I personally think it is OK if ptp does not like booleans much (although others mileage may vary). I will close this, if anyone disagrees just comment!

bashtage · 2019-12-10T18:16:49Z

It would be nice if there was some more consistency for booleans w.r.t when they are treated as 1 and normal int math appears or when adding is treated as logical_or.

seberg · 2019-12-10T18:38:42Z

Yes, I agree, but we are in a tricky position here. Python traditionally treats booleans exactly like 0 and 1 (they are just singletons of a subclass of int). We have been stricter, because it is even stranger in NumPy world (e.g. with limited dtype precision and what happens if you index with a boolean).

But we cannot decide that np.True_ + np.True_ stops working, it is too "normal" in the python world. The subtract is a case where we push back. Python is moving slowly in that direction, but until they change math as well...

bashtage added a commit to bashtage/statsmodels that referenced this issue Dec 5, 2019

MAINT: Workaround NumPy ptp bug

375ebdf

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

bashtage mentioned this issue Dec 5, 2019

MAINT: Workaround NumPy ptp issue statsmodels/statsmodels#6316

Merged

4 tasks

bashtage added a commit to bashtage/statsmodels that referenced this issue Dec 5, 2019

MAINT: Workaround NumPy ptp bug

7b3998e

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

bashtage added a commit to bashtage/statsmodels that referenced this issue Dec 5, 2019

MAINT: Workaround NumPy ptp bug

4b9369f

Workaround for bool data when using ptp Also silences pandas ptp warning in this location xref numpy/numpy#15057

seberg closed this as completed Dec 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: ptp fails on bool #15057

BUG: ptp fails on bool #15057

bashtage commented Dec 5, 2019

eric-wieser commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019

Uh oh!

WarrenWeckesser commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019 •

edited

Loading

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019 •

edited

Loading

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019 •

edited

Loading

Uh oh!

bashtage commented Dec 5, 2019 •

edited

Loading

Uh oh!

WarrenWeckesser commented Dec 5, 2019 •

edited

Loading

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

seberg commented Dec 6, 2019

Uh oh!

bashtage commented Dec 10, 2019

Uh oh!

seberg commented Dec 10, 2019

Uh oh!

Uh oh!

BUG: ptp fails on bool #15057

BUG: ptp fails on bool #15057

Comments

bashtage commented Dec 5, 2019

Reproducing code example:

Error message:

Numpy/Python version information:

eric-wieser commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019

Uh oh!

WarrenWeckesser commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

eric-wieser commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

bashtage commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WarrenWeckesser commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Dec 5, 2019

Uh oh!

seberg commented Dec 6, 2019

Uh oh!

bashtage commented Dec 10, 2019

Uh oh!

seberg commented Dec 10, 2019

Uh oh!

eric-wieser commented Dec 5, 2019 •

edited

Loading

eric-wieser commented Dec 5, 2019 •

edited

Loading

bashtage commented Dec 5, 2019 •

edited

Loading

bashtage commented Dec 5, 2019 •

edited

Loading

WarrenWeckesser commented Dec 5, 2019 •

edited

Loading