MNT Adjust code after NEP 51 numpy scalar formatting changes #27042

lesteve · 2023-08-09T13:42:39Z

There are some failures in scipy-dev build due to numpy/numpy#22449 that implements NEP 51 as mentioned in #26814 (comment).

I tried to adjust the scikit-learn error code when it makes sense e.g. when in my opinion adding the full numpy type make it less easier to read for example I find:

ValueError: The classes, [np.int64(0), np.int64(1), np.int64(2), np.int64(3)], are not in class_weight

a lot less easier to read than:

ValueError: The classes, [0, 1, 2, 3], are not in class_weight

When that was not possible, I adjusted the test to be less strict.

Note this may well be the case that there are other instances of this issue, that are not caught by our tests, not sure if there is an easy way to find them ...

github-actions · 2023-08-09T13:44:37Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c9472a7. Link to the linter CI: here}

lesteve · 2023-08-09T13:46:52Z

sklearn/utils/validation.py

@@ -2270,9 +2270,9 @@ def _check_pos_label_consistency(pos_label, y_true):
            or np.array_equal(classes, [1])
        )
    ):
-        classes_repr = ", ".join(repr(c) for c in classes)
+        classes_repr = str(classes.tolist()).replace('[', '{').replace(']', '}')


I kept the same error message as previously, but honestly I would be fine with using square brackets rather than curly brackets to simplify the code, i.e.:

y_true takes value in ['a', 'b'] ...

rather than

y_true takes value in {'a', 'b'} ...

lesteve · 2023-08-09T14:58:32Z

Oh well it seems like the scipy dev wheel in https://anaconda.org/scientific-python-nightly-wheels/scipy/files is not recent enough to have scipy/scipy#19015 hence the error:

ImportError: cannot import name 'ComplexWarning' from 'numpy' (/usr/share/miniconda/envs/testvenv/lib/python3.11/site-packages/numpy/__init__.py)

betatim · 2023-08-09T15:42:44Z

sklearn/preprocessing/_encoders.py

@@ -774,8 +774,8 @@ def _map_drop_idx_to_infrequent(self, feature_idx, drop_idx):
        if infrequent_indices is not None and drop_idx in infrequent_indices:
            categories = self.categories_[feature_idx]
            raise ValueError(
-                f"Unable to drop category {categories[drop_idx]!r} from feature"
-                f" {feature_idx} because it is infrequent"
+                f"Unable to drop category {categories[drop_idx].tolist()!r} from"


If I understand the NEP correctly then repr(np.int64(1)) will now look like np.int64(1) but str(np.int64(1)) will (continue) to be 1. So I think we can just replace the !r with !s no?

But then isn't categories[drop_idx] an array ... so why does the formatting change?

I think for strings, we generally want a quoting which is why we use !r

In other words you want:

Unable to drop category 'a' from feature 0

instead of

Unable to drop category a from feature 0

betatim · 2023-08-10T12:58:04Z

I tried to install the numpy and scipy nightlies to run the tests locally to see what things look like (I find it too hard to do in my head), but right now scipy and numpy are incompatible with each other (some exception got moved but scipy hasn't been adjusted yet). So I propose we wait a bit to resolve this.

glemaitre

LGTM. I am fine with the rendering.

…nto nep-51-scalar-formatting

lesteve · 2023-08-18T12:43:57Z

The scipy-dev test passes

OK then there are some doctests in rst that are broken, left to do on another PR ...

lesteve · 2023-08-18T17:47:06Z

sklearn/impute/tests/test_impute.py

@@ -259,7 +259,7 @@ def test_imputation_median_special_cases():
 @pytest.mark.parametrize("dtype", [None, object, str])
 def test_imputation_mean_median_error_invalid_type(strategy, dtype):
    X = np.array([["a", "b", 3], [4, "e", 6], ["g", "h", 9]], dtype=dtype)
-    msg = "non-numeric data:\ncould not convert string to float: '"
+    msg = "non-numeric data:\ncould not convert string to float:"


I remember here why I removed the quote here. We reraise an error from numpy ... we could also rewrite the error message if we think this is really important.

scikit-learn/sklearn/impute/_base.py

Line 327 in 83760ab

raise new_ve from None

In [1]: import numpy as np In [2]: np.array(['a']).astype(np.float64) -------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 np.array(['a']).astype(np.float64) ValueError: could not convert string to float: np.str_('a')

I think rewriting the error message will be confusing for the user. Not sure, but it somehow feels wrong to modify an exception that we don't generate.

lesteve · 2023-08-18T17:48:21Z

sklearn/impute/tests/test_impute.py

@@ -259,7 +259,7 @@ def test_imputation_median_special_cases():
 @pytest.mark.parametrize("dtype", [None, object, str])
 def test_imputation_mean_median_error_invalid_type(strategy, dtype):
    X = np.array([["a", "b", 3], [4, "e", 6], ["g", "h", 9]], dtype=dtype)
-    msg = "non-numeric data:\ncould not convert string to float: '"
+    msg = "non-numeric data:\ncould not convert string to float:"


I remember here why I removed the quote here. We reraise an error from numpy ... we could also rewrite the error message if we think this is really important.

scikit-learn/sklearn/impute/_base.py

Line 327 in 83760ab

raise new_ve from None

In [1]: import numpy as np In [2]: np.array(['a']).astype(np.float64) -------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 np.array(['a']).astype(np.float64) ValueError: could not convert string to float: np.str_('a')

betatim

Let's do it!

ogrisel

LGTM as well.

ogrisel · 2023-08-22T09:27:24Z

Thanks for the fixes @lesteve.

…learn#27042)

lesteve added 2 commits August 9, 2023 15:35

MNT Adjust code after NEP 51 numpy scalar formatting changes

4a61a39

[scipy-dev]

50c8879

github-actions bot added module:impute module:preprocessing module:utils labels Aug 9, 2023

lesteve added the No Changelog Needed label Aug 9, 2023

lesteve commented Aug 9, 2023

View reviewed changes

lesteve mentioned this pull request Aug 9, 2023

CI Build and test Python 3.12 wheels #27027

Merged

[scipy-dev]

e439f6d

betatim reviewed Aug 9, 2023

View reviewed changes

glemaitre approved these changes Aug 18, 2023

View reviewed changes

lesteve mentioned this pull request Aug 18, 2023

MNT Remove DeprecationWarning for scipy.sparse.linalg.cg tol vs rtol argument #26814

Merged

lesteve added 3 commits August 18, 2023 13:54

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

6789108

…nto nep-51-scalar-formatting

[scipy-dev] [azure parallel]

4d17d1f

[scipy-dev] Use .item instead of .tolist

cba58a4

[scipy-dev] clean-up

c9472a7

lesteve commented Aug 18, 2023

View reviewed changes

betatim approved these changes Aug 21, 2023

View reviewed changes

ogrisel reviewed Aug 22, 2023

View reviewed changes

ogrisel approved these changes Aug 22, 2023

View reviewed changes

ogrisel merged commit f55da62 into scikit-learn:main Aug 22, 2023

akaashpatelmns pushed a commit to akaashp2000/scikit-learn that referenced this pull request Aug 25, 2023

MNT Adjust code after NEP 51 numpy scalar formatting changes (scikit-…

d72539e

…learn#27042)

lesteve mentioned this pull request Aug 29, 2023

CI Fix scipy-dev issues related to numpy 2.0 changes #27190

Merged

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023

MNT Adjust code after NEP 51 numpy scalar formatting changes (scikit-…

c0ea2c5

…learn#27042)

jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023

MNT Adjust code after NEP 51 numpy scalar formatting changes (#27042)

37e821b

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

MNT Adjust code after NEP 51 numpy scalar formatting changes (scikit-…

14c0369

…learn#27042)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Adjust code after NEP 51 numpy scalar formatting changes #27042

MNT Adjust code after NEP 51 numpy scalar formatting changes #27042

lesteve commented Aug 9, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023 •

edited

Loading

lesteve Aug 9, 2023

lesteve commented Aug 9, 2023

betatim Aug 9, 2023

lesteve Aug 18, 2023

betatim commented Aug 10, 2023

glemaitre left a comment

lesteve commented Aug 18, 2023 •

edited

Loading

lesteve Aug 18, 2023

betatim Aug 21, 2023

lesteve Aug 18, 2023

betatim left a comment

ogrisel left a comment

ogrisel commented Aug 22, 2023

MNT Adjust code after NEP 51 numpy scalar formatting changes #27042

MNT Adjust code after NEP 51 numpy scalar formatting changes #27042

Conversation

lesteve commented Aug 9, 2023 • edited Loading

github-actions bot commented Aug 9, 2023 • edited Loading

✔️ Linting Passed

lesteve Aug 9, 2023

Choose a reason for hiding this comment

lesteve commented Aug 9, 2023

betatim Aug 9, 2023

Choose a reason for hiding this comment

lesteve Aug 18, 2023

Choose a reason for hiding this comment

betatim commented Aug 10, 2023

glemaitre left a comment

Choose a reason for hiding this comment

lesteve commented Aug 18, 2023 • edited Loading

lesteve Aug 18, 2023

Choose a reason for hiding this comment

betatim Aug 21, 2023

Choose a reason for hiding this comment

lesteve Aug 18, 2023

Choose a reason for hiding this comment

betatim left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Aug 22, 2023

lesteve commented Aug 9, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023 •

edited

Loading

lesteve commented Aug 18, 2023 •

edited

Loading