Skip to content

BUG: Make bool(str_scalar) and str_scalar.astype(bool) consistent #9848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Oct 11, 2017

Fixes #9847, by changing str_scalar.astype(bool) to no longer mean bool(int(str_scalar))

@eric-wieser eric-wieser force-pushed the fix-void-bool branch 2 times, most recently from 5cfbb51 to baefb91 Compare October 11, 2017 08:07
@eric-wieser eric-wieser changed the title BUG: Prevent void scalars decaying into python objects BUG: Make bool(scalar) and scalar.astype(bool) consistent Oct 11, 2017
``string.astype(bool)`` now uses the normal python semantics for converting
---------------------------------------------------------------------------
Previously this was interpreted as ``string.astype(int).astype(bool)``, which
interpeted ``'0'`` as ``False``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to go through a FutureWarning here?

@mhvk
Copy link
Contributor

mhvk commented Oct 11, 2017

Failures seem real. Not sure I understand why getitem is no good, so am not much use as a reviewer...

@eric-wieser
Copy link
Member Author

I wish runtests.py would catch the warnings that cause travis to fail like that. Hopefully fixed now.

@ahaldane
Copy link
Member

Another way the old behavior was bad: np.array(['abc']).astype(bool) fails with a nonsensical error invalid literal for int() with base 10.

@ahaldane
Copy link
Member

This seems like a fairly clear "good move", and code looks fine to me.

As to whether we want a round of deprecation with FutureWarning: I lean towards yes, since it's not a very critical bug, yet the change is quite user-visible and I can even imagine someone depending on it. So I imagine a FutureWarning for 1.14, then change the behavior in 1.15.

@ahaldane
Copy link
Member

ahaldane commented Oct 11, 2017

One thing that is arguably worse this way is that this no longer works:

>>> np.array([True, False], dtype=bool).astype(str).astype(bool)

but I'm not sure it should work. After all, bool(str(bool(False))) is True in Python.

Edit: (Wait never mind, this errors out in current master)

@eric-wieser
Copy link
Member Author

I lean towards yes, since it's not a very critical bug, yet the change is quite user-visible and I can even imagine someone depending on it.

A naive implementation of adding the following to numpy\core\src\multiarray\arraytypes.c.src:1577

#if defined(IS_TO_BOOL)
            /* Numpy 1.14, 2017-10-11 */
            if (DEPRECATE_FUTUREWARNING(
                "In future, casting string-like types to bool will not cast "
                "to int first. To keep the current behaviour of having strings "
                "of '0's be considered false, use .astype(int).astype(bool)."
                ) < 0) {
                return;
            }
#endif

\leads to the warning being emitted once for every single element in the array!

>>> a = np.array(list('test'))
>>> a.astype(bool)
FutureWarning: In future, casting string-like types to bool will not cast to int first. To keep the current behaviour of having strings of '0's be considered false, use .astype(int).astype(bool).
FutureWarning: In future, casting string-like types to bool will not cast to int first. To keep the current behaviour of having strings of '0's be considered false, use .astype(int).astype(bool).
FutureWarning: In future, casting string-like types to bool will not cast to int first. To keep the current behaviour of having strings of '0's be considered false, use .astype(int).astype(bool).
FutureWarning: In future, casting string-like types to bool will not cast to int first. To keep the current behaviour of having strings of '0's be considered false, use .astype(int).astype(bool).
array([ True,  True,  True,  True], dtype=bool)

Should I only emit the warning if the behaviour will change? That's still pathological for np.array(['0'] * 10000)

@charris
Copy link
Member

charris commented Oct 15, 2017

Needs rebase, also to get the merged #9856.

Caused by string values being cast to an int first and then a bool, ie arr.astype(int).astype(bool).
This causes crashes on non-numeric strings, and only accepts strings of 0 as false.

Fixes numpy#9847
@eric-wieser
Copy link
Member Author

Rebased, but not ready to merge without a decision about if and when to FutureWarning

@eric-wieser eric-wieser changed the title BUG: Make bool(scalar) and scalar.astype(bool) consistent BUG: Make bool(str_scalar) and str_scalar.astype(bool) consistent Oct 17, 2017
@eric-wieser
Copy link
Member Author

Closing in favor of having discussion in #9875, since this is a wider issue than anticipated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants