MAINT fix the way to call stats.mode #23633

glemaitre · 2022-06-15T13:40:29Z

Related to #23626

stats.mode cannot be unpacked as before with the latest SciPy.

glemaitre · 2022-06-15T16:14:58Z

We should as well check with upstream if the change is intended. It will break older scikit-learn versions with the newest SciPy version without a deprecation warning.

Micky774 · 2022-06-15T20:57:03Z

This was a rather surprising change. Similar problem w/ utils.tests.test_extmath::test_uniform_weights

ogrisel · 2022-06-16T08:53:50Z

For reference here is a snippet of the test failure caused by this breaking change in scipy.stats.mode:

    def _most_frequent(array, extra_value, n_repeat):
        """Compute the most frequent value in a 1d array extended with
        [extra_value] * n_repeat, where extra_value is assumed to be not part
        of the array."""
        # Compute the most frequent value in array only
        if array.size > 0:
            if array.dtype == object:
                # scipy.stats.mode is slow with object dtype array.
                # Python Counter is more efficient
                counter = Counter(array)
                most_frequent_count = counter.most_common(1)[0][1]
                # tie breaking similarly to scipy.stats.mode
                most_frequent_value = min(
                    value
                    for value, count in counter.items()
                    if count == most_frequent_count
                )
            else:
                mode = stats.mode(array)
>               most_frequent_value = mode[0][0]
E               IndexError: invalid index to scalar variable.

array      = array([1., 1., 1.])
extra_value = nan
mode       = ModeResult(mode=1.0, count=3)
n_repeat   = 0

ogrisel

The remaining failures in pylatest_pip_scipy_dev are unrelated.

ogrisel · 2022-06-16T12:50:06Z

In scipy/scipy#16418 it is argued that this change of behavior is considered a bugfix. So if scipy 1.9 gets released without a deprecation cycle, we will need to quickly release scikit-learn 1.1.2 with a backport of this fix and #23640.

lesteve · 2022-06-16T15:42:04Z

I am not sure how this fix works, since depending on the scipy version, most_frequent_value and most_frequent_count will be a numpy scalar (scipy >= 1.9) or a numpy 1d array with a single value (scipy < 1.9).

Should we not do a sklearn.util.fixes.mode so that all the call of scipy.stats.mode gets fixed in a uniform manner?

ogrisel · 2022-06-16T17:28:18Z

Should we not do a sklearn.util.fixes.mode so that all the call of scipy.stats.mode gets fixed in a uniform manner?

This is a good suggestion. #23640 will need to be updated accordingly. Maybe let's do both in the same PR.

glemaitre · 2022-06-16T17:43:53Z

Indeed, this is quite funny that the current CIs do not fail. I assume that they should kind of ravel the 1-element array.

glemaitre · 2022-06-16T18:09:24Z

I need the change introduced by @Micky774 in #23640 as well to adopt the test.
One question is: do we want to modify weighted_mode to make is as well consistent with mode from SciPy? We would need to deprecate the behaviour so it could be part of another PR anyway.

ogrisel · 2022-06-16T18:13:13Z

One question is: do we want to modify weighted_mode to make is as well consistent with mode from SciPy? We would need to deprecate the behaviour so it could be part of another PR anyway.

I believe so. +1 for separate the PR to make backporting the basic compat fix to 1.1.2 easier if needed.

glemaitre · 2022-06-16T21:58:00Z

sklearn/utils/fixes.py

+        Examples
+        --------
+        >>> import numpy as np
+        >>> a = np.array([[6, 8, 3, 0],
+        ...               [3, 2, 1, 7],
+        ...               [8, 1, 8, 4],
+        ...               [5, 3, 0, 5],
+        ...               [4, 7, 5, 9]])
+        >>> from sklearn.utils.fixes import mode
+        >>> mode(a)
+        ModeResult(mode=array([3, 1, 0, 0]), count=array([1, 1, 1, 1]))
+        To get mode of whole array, specify ``axis=None``:
+        >>> mode(a, axis=None)
+        ModeResult(mode=3, count=3)
+        """


Uhm no. The idea is that sklearn.utils.fixes.mode will always behave like scipy>=1.9. So we either import it or redefine it and reduce the dimension.

sklearn/utils/fixes.py

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

glemaitre · 2022-06-21T07:46:14Z

It should be ready for reviews.

ogrisel · 2022-06-21T13:26:30Z

Note that this PR will need to be adapted depending on the result of the discussion with upstream:

BUG: breaking change: scipy.stats.mode returned value has changed shape without deprecation scipy/scipy#16418 (comment)

glemaitre · 2022-06-21T13:28:14Z

I put it WIP until we know what happens upstream.

thomasjpfan · 2022-07-25T16:47:49Z

SciPy moved forward with a solution here: scipy/scipy#16429

glemaitre · 2022-07-26T06:47:55Z

Cool. I will remove the backport and just make the change in the code then.

thomasjpfan

Do we need to adjust the code here?

scikit-learn/sklearn/neighbors/_classification.py

Line 252 in 79c21c5

mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

sklearn/impute/_base.py

thomasjpfan · 2022-08-02T15:05:44Z

sklearn/impute/_base.py

-            mode = stats.mode(array)
-            most_frequent_value = mode[0][0]
-            most_frequent_count = mode[1][0]
+            most_frequent_value, most_frequent_count = stats.mode(array)


To be specific, I think we need to work around SciPy's deprecation warning by placing this into utils.fixes and adjust the code that uses mode.

def _mode(a, axis=0): if sp_version >= parse_version("1.9.0"): return stats.mode(a, keepdims=False) # unpack for SciPy version < 1.9 results = stats.mode(a) return results[0][0], results[1][0]

When our min supported SciPy version is 1.9 we can call stats.mode directly with keepdims=False.

We do not use mode in too many places:

sklearn/impute/_base.py: mode = stats.mode(array) sklearn/neighbors/_classification.py: mode, _ = stats.mode(_y[neigh_ind, k], axis=1) sklearn/utils/tests/test_extmath.py: mode, score = stats.mode(x, axis)

I opened a PR to your fork with this idea: glemaitre#12

thomasjpfan

LGTM

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

The scipy warning is caused by scikit-learn internal usage of scipy. See scikit-learn/scikit-learn#23633

* Fix code export for new scikit-learn * Show warnings, but dont error, ignore scipy warning The scipy warning is caused by scikit-learn internal usage of scipy. See scikit-learn/scikit-learn#23633 * Explicitly add whiten to avoid deprecation warning * Cast array to list to avoid ambiguous comparison The previous statement was ambiguous as the `not in` operation could also interpreted to be used in element-wise fashion. * Allow to ignore terminals in search space for Individual.from_string This allows you to reconstruct an individual if additional hyperparameters have been added to the search space. * Add test for code export

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

MAINT fix the way to call stats.mode

ad46670

github-actions bot added the module:impute label Jun 15, 2022

Micky774 mentioned this pull request Jun 15, 2022

FIX/MNT Compensated for changes in stats.mode to keep internal behavior #23640

Closed

Trigger [scipy-dev]

8dfe99a

ogrisel mentioned this pull request Jun 16, 2022

BUG: breaking change: scipy.stats.mode returned value has changed shape without deprecation scipy/scipy#16418

Closed

ogrisel approved these changes Jun 16, 2022

View reviewed changes

ogrisel added this to the 1.1.2 milestone Jun 16, 2022

introduce sklearn.fixes.mode

b7acf87

glemaitre added the No Changelog Needed label Jun 16, 2022

glemaitre added 2 commits June 16, 2022 20:39

iter

3f1dab5

iter

59c5a26

glemaitre commented Jun 16, 2022

View reviewed changes

glemaitre added 2 commits June 16, 2022 23:06

iter

f49740c

docstring numpydoc

f8df7db

Micky774 reviewed Jun 20, 2022

View reviewed changes

sklearn/utils/fixes.py Outdated Show resolved Hide resolved

glemaitre and others added 2 commits June 20, 2022 16:35

Update sklearn/utils/fixes.py

4b21ff6

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

iter

c1b5e8a

glemaitre marked this pull request as draft June 21, 2022 13:28

glemaitre added 2 commits July 26, 2022 08:57

revert backport

7a5c86d

revert missing import

594d50d

glemaitre marked this pull request as ready for review July 26, 2022 14:12

Merge branch 'main' into nightly_mode_scipy

8a9b1ed

thomasjpfan reviewed Jul 26, 2022

View reviewed changes

sklearn/impute/_base.py Outdated Show resolved Hide resolved

iter [scipy-dev]

56ce1e2

thomasjpfan reviewed Aug 2, 2022

View reviewed changes

thomasjpfan mentioned this pull request Aug 3, 2022

FIX Place mode in utils.fixes glemaitre/scikit-learn#12

Merged

thomasjpfan and others added 4 commits August 4, 2022 10:18

FIX Place mode in utils.fixes (#12)

e0c77fd

Merge remote-tracking branch 'origin/main' into nightly_mode_scipy

d2c1ff4

iter

91f469f

CLN Simplify PR

8119ff8

thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Aug 4, 2022

CLN Smaller diff

ddcd6aa

thomasjpfan mentioned this pull request Aug 4, 2022

REL scikit-learn 1.1.2 #24115

Merged

12 tasks

thomasjpfan approved these changes Aug 4, 2022

View reviewed changes

glemaitre merged commit 02a4b34 into scikit-learn:main Aug 5, 2022

PGijsbers added a commit to amore-labs/gama that referenced this pull request Aug 25, 2022

Show warnings, but dont error, ignore scipy warning

2025c28

The scipy warning is caused by scikit-learn internal usage of scipy. See scikit-learn/scikit-learn#23633

ogrisel mentioned this pull request Oct 18, 2022

CI Run different sklearn versions in CI skops-dev/skops#196

Merged

hayesall mentioned this pull request Nov 12, 2022

MAINT Replace stats.mode calls with fixes._mode scikit-learn-contrib/imbalanced-learn#938

Merged

lesteve mentioned this pull request Jan 13, 2023

MNT fix test following scipy.stats.mode change in scipy development version #25393

Merged

ageron mentioned this pull request Feb 18, 2023

Multilabel Classification FutureWarning with KNeighborsClassifier.predict() ageron/handson-ml3#45

Open

Uh oh!

MAINT fix the way to call stats.mode #23633

MAINT fix the way to call stats.mode #23633

Conversation

glemaitre commented Jun 15, 2022

Uh oh!

glemaitre commented Jun 15, 2022

Uh oh!

Micky774 commented Jun 15, 2022

Uh oh!

ogrisel commented Jun 16, 2022

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 16, 2022

Uh oh!

lesteve commented Jun 16, 2022

Uh oh!

ogrisel commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jun 16, 2022

Uh oh!

glemaitre commented Jun 16, 2022

Uh oh!

ogrisel commented Jun 16, 2022

Uh oh!

glemaitre Jun 16, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Jun 21, 2022

Uh oh!

ogrisel commented Jun 21, 2022

Uh oh!

glemaitre commented Jun 21, 2022

Uh oh!

thomasjpfan commented Jul 25, 2022

Uh oh!

glemaitre commented Jul 26, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan Aug 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Jun 16, 2022 •

edited

Loading

thomasjpfan Aug 2, 2022 •

edited

Loading