-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT Adjust code after NEP 51 numpy scalar formatting changes #27042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/utils/validation.py
Outdated
@@ -2270,9 +2270,9 @@ def _check_pos_label_consistency(pos_label, y_true): | |||
or np.array_equal(classes, [1]) | |||
) | |||
): | |||
classes_repr = ", ".join(repr(c) for c in classes) | |||
classes_repr = str(classes.tolist()).replace('[', '{').replace(']', '}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the same error message as previously, but honestly I would be fine with using square brackets rather than curly brackets to simplify the code, i.e.:
y_true takes value in ['a', 'b'] ...
rather than
y_true takes value in {'a', 'b'} ...
Oh well it seems like the scipy dev wheel in https://anaconda.org/scientific-python-nightly-wheels/scipy/files is not recent enough to have scipy/scipy#19015 hence the error:
|
sklearn/preprocessing/_encoders.py
Outdated
@@ -774,8 +774,8 @@ def _map_drop_idx_to_infrequent(self, feature_idx, drop_idx): | |||
if infrequent_indices is not None and drop_idx in infrequent_indices: | |||
categories = self.categories_[feature_idx] | |||
raise ValueError( | |||
f"Unable to drop category {categories[drop_idx]!r} from feature" | |||
f" {feature_idx} because it is infrequent" | |||
f"Unable to drop category {categories[drop_idx].tolist()!r} from" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the NEP correctly then repr(np.int64(1))
will now look like np.int64(1)
but str(np.int64(1))
will (continue) to be 1
. So I think we can just replace the !r
with !s
no?
But then isn't categories[drop_idx]
an array ... so why does the formatting change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for strings, we generally want a quoting which is why we use !r
In other words you want:
Unable to drop category 'a' from feature 0
instead of
Unable to drop category a from feature 0
I tried to install the numpy and scipy nightlies to run the tests locally to see what things look like (I find it too hard to do in my head), but right now scipy and numpy are incompatible with each other (some exception got moved but scipy hasn't been adjusted yet). So I propose we wait a bit to resolve this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am fine with the rendering.
OK then there are some doctests in rst that are broken, left to do on another PR ... |
@@ -259,7 +259,7 @@ def test_imputation_median_special_cases(): | |||
@pytest.mark.parametrize("dtype", [None, object, str]) | |||
def test_imputation_mean_median_error_invalid_type(strategy, dtype): | |||
X = np.array([["a", "b", 3], [4, "e", 6], ["g", "h", 9]], dtype=dtype) | |||
msg = "non-numeric data:\ncould not convert string to float: '" | |||
msg = "non-numeric data:\ncould not convert string to float:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember here why I removed the quote here. We reraise an error from numpy ... we could also rewrite the error message if we think this is really important.
scikit-learn/sklearn/impute/_base.py
Line 327 in 83760ab
raise new_ve from None |
In [1]: import numpy as np
In [2]: np.array(['a']).astype(np.float64)
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 np.array(['a']).astype(np.float64)
ValueError: could not convert string to float: np.str_('a')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewriting the error message will be confusing for the user. Not sure, but it somehow feels wrong to modify an exception that we don't generate.
@@ -259,7 +259,7 @@ def test_imputation_median_special_cases(): | |||
@pytest.mark.parametrize("dtype", [None, object, str]) | |||
def test_imputation_mean_median_error_invalid_type(strategy, dtype): | |||
X = np.array([["a", "b", 3], [4, "e", 6], ["g", "h", 9]], dtype=dtype) | |||
msg = "non-numeric data:\ncould not convert string to float: '" | |||
msg = "non-numeric data:\ncould not convert string to float:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember here why I removed the quote here. We reraise an error from numpy ... we could also rewrite the error message if we think this is really important.
scikit-learn/sklearn/impute/_base.py
Line 327 in 83760ab
raise new_ve from None |
In [1]: import numpy as np
In [2]: np.array(['a']).astype(np.float64)
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 np.array(['a']).astype(np.float64)
ValueError: could not convert string to float: np.str_('a')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well.
Thanks for the fixes @lesteve. |
There are some failures in scipy-dev build due to numpy/numpy#22449 that implements NEP 51 as mentioned in #26814 (comment).
I tried to adjust the scikit-learn error code when it makes sense e.g. when in my opinion adding the full numpy type make it less easier to read for example I find:
a lot less easier to read than:
When that was not possible, I adjusted the test to be less strict.
Note this may well be the case that there are other instances of this issue, that are not caught by our tests, not sure if there is an easy way to find them ...