Skip to content

Conversation

glemaitre
Copy link
Member

Related to #23626

stats.mode cannot be unpacked as before with the latest SciPy.

@glemaitre
Copy link
Member Author

We should as well check with upstream if the change is intended. It will break older scikit-learn versions with the newest SciPy version without a deprecation warning.

@Micky774
Copy link
Contributor

This was a rather surprising change. Similar problem w/ utils.tests.test_extmath::test_uniform_weights

@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2022

For reference here is a snippet of the test failure caused by this breaking change in scipy.stats.mode:

    def _most_frequent(array, extra_value, n_repeat):
        """Compute the most frequent value in a 1d array extended with
        [extra_value] * n_repeat, where extra_value is assumed to be not part
        of the array."""
        # Compute the most frequent value in array only
        if array.size > 0:
            if array.dtype == object:
                # scipy.stats.mode is slow with object dtype array.
                # Python Counter is more efficient
                counter = Counter(array)
                most_frequent_count = counter.most_common(1)[0][1]
                # tie breaking similarly to scipy.stats.mode
                most_frequent_value = min(
                    value
                    for value, count in counter.items()
                    if count == most_frequent_count
                )
            else:
                mode = stats.mode(array)
>               most_frequent_value = mode[0][0]
E               IndexError: invalid index to scalar variable.

array      = array([1., 1., 1.])
extra_value = nan
mode       = ModeResult(mode=1.0, count=3)
n_repeat   = 0

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining failures in pylatest_pip_scipy_dev are unrelated.

@ogrisel ogrisel added this to the 1.1.2 milestone Jun 16, 2022
@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2022

In scipy/scipy#16418 it is argued that this change of behavior is considered a bugfix. So if scipy 1.9 gets released without a deprecation cycle, we will need to quickly release scikit-learn 1.1.2 with a backport of this fix and #23640.

@lesteve
Copy link
Member

lesteve commented Jun 16, 2022

I am not sure how this fix works, since depending on the scipy version, most_frequent_value and most_frequent_count will be a numpy scalar (scipy >= 1.9) or a numpy 1d array with a single value (scipy < 1.9).

Should we not do a sklearn.util.fixes.mode so that all the call of scipy.stats.mode gets fixed in a uniform manner?

@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2022

Should we not do a sklearn.util.fixes.mode so that all the call of scipy.stats.mode gets fixed in a uniform manner?

This is a good suggestion. #23640 will need to be updated accordingly. Maybe let's do both in the same PR.

@glemaitre
Copy link
Member Author

Indeed, this is quite funny that the current CIs do not fail. I assume that they should kind of ravel the 1-element array.

@glemaitre
Copy link
Member Author

I need the change introduced by @Micky774 in #23640 as well to adopt the test.
One question is: do we want to modify weighted_mode to make is as well consistent with mode from SciPy? We would need to deprecate the behaviour so it could be part of another PR anyway.

@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2022

One question is: do we want to modify weighted_mode to make is as well consistent with mode from SciPy? We would need to deprecate the behaviour so it could be part of another PR anyway.

I believe so. +1 for separate the PR to make backporting the basic compat fix to 1.1.2 easier if needed.

Comment on lines 74 to 88
Examples
--------
>>> import numpy as np
>>> a = np.array([[6, 8, 3, 0],
... [3, 2, 1, 7],
... [8, 1, 8, 4],
... [5, 3, 0, 5],
... [4, 7, 5, 9]])
>>> from sklearn.utils.fixes import mode
>>> mode(a)
ModeResult(mode=array([3, 1, 0, 0]), count=array([1, 1, 1, 1]))
To get mode of whole array, specify ``axis=None``:
>>> mode(a, axis=None)
ModeResult(mode=3, count=3)
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm no. The idea is that sklearn.utils.fixes.mode will always behave like scipy>=1.9. So we either import it or redefine it and reduce the dimension.

glemaitre and others added 2 commits June 20, 2022 16:35
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
@glemaitre
Copy link
Member Author

It should be ready for reviews.

@ogrisel
Copy link
Member

ogrisel commented Jun 21, 2022

Note that this PR will need to be adapted depending on the result of the discussion with upstream:

@glemaitre
Copy link
Member Author

I put it WIP until we know what happens upstream.

@glemaitre glemaitre marked this pull request as draft June 21, 2022 13:28
@thomasjpfan
Copy link
Member

SciPy moved forward with a solution here: scipy/scipy#16429

@glemaitre
Copy link
Member Author

Cool. I will remove the backport and just make the change in the code then.

@glemaitre glemaitre marked this pull request as ready for review July 26, 2022 14:12
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to adjust the code here?

mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

mode = stats.mode(array)
most_frequent_value = mode[0][0]
most_frequent_count = mode[1][0]
most_frequent_value, most_frequent_count = stats.mode(array)
Copy link
Member

@thomasjpfan thomasjpfan Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be specific, I think we need to work around SciPy's deprecation warning by placing this into utils.fixes and adjust the code that uses mode.

def _mode(a, axis=0):
    if sp_version >= parse_version("1.9.0"):
        return stats.mode(a, keepdims=False)
    
    # unpack for SciPy version < 1.9
    results = stats.mode(a)
    return results[0][0], results[1][0]

When our min supported SciPy version is 1.9 we can call stats.mode directly with keepdims=False.

We do not use mode in too many places:

sklearn/impute/_base.py:            mode = stats.mode(array)
sklearn/neighbors/_classification.py:                mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
sklearn/utils/tests/test_extmath.py:        mode, score = stats.mode(x, axis)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR to your fork with this idea: glemaitre#12

@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Aug 4, 2022
@thomasjpfan thomasjpfan mentioned this pull request Aug 4, 2022
12 tasks
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit 02a4b34 into scikit-learn:main Aug 5, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Aug 5, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre added a commit that referenced this pull request Aug 5, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
PGijsbers added a commit to amore-labs/gama that referenced this pull request Aug 25, 2022
The scipy warning is caused by scikit-learn internal usage of scipy.
See scikit-learn/scikit-learn#23633
PGijsbers added a commit to amore-labs/gama that referenced this pull request Aug 25, 2022
* Fix code export for new scikit-learn

* Show warnings, but dont error, ignore scipy warning

The scipy warning is caused by scikit-learn internal usage of scipy.
See scikit-learn/scikit-learn#23633

* Explicitly add whiten to avoid deprecation warning

* Cast array to list to avoid ambiguous comparison

The previous statement was ambiguous as  the `not in` operation could also
interpreted to be used in element-wise fashion.

* Allow to ignore terminals in search space for Individual.from_string

This allows you to reconstruct an individual if additional hyperparameters
have been added to the search space.

* Add test for code export
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Sep 12, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:impute No Changelog Needed To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants