-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT fix the way to call stats.mode #23633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We should as well check with upstream if the change is intended. It will break older scikit-learn versions with the newest SciPy version without a deprecation warning. |
This was a rather surprising change. Similar problem w/ |
For reference here is a snippet of the test failure caused by this breaking change in def _most_frequent(array, extra_value, n_repeat):
"""Compute the most frequent value in a 1d array extended with
[extra_value] * n_repeat, where extra_value is assumed to be not part
of the array."""
# Compute the most frequent value in array only
if array.size > 0:
if array.dtype == object:
# scipy.stats.mode is slow with object dtype array.
# Python Counter is more efficient
counter = Counter(array)
most_frequent_count = counter.most_common(1)[0][1]
# tie breaking similarly to scipy.stats.mode
most_frequent_value = min(
value
for value, count in counter.items()
if count == most_frequent_count
)
else:
mode = stats.mode(array)
> most_frequent_value = mode[0][0]
E IndexError: invalid index to scalar variable.
array = array([1., 1., 1.])
extra_value = nan
mode = ModeResult(mode=1.0, count=3)
n_repeat = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining failures in pylatest_pip_scipy_dev
are unrelated.
In scipy/scipy#16418 it is argued that this change of behavior is considered a bugfix. So if scipy 1.9 gets released without a deprecation cycle, we will need to quickly release scikit-learn 1.1.2 with a backport of this fix and #23640. |
I am not sure how this fix works, since depending on the scipy version, Should we not do a |
This is a good suggestion. #23640 will need to be updated accordingly. Maybe let's do both in the same PR. |
Indeed, this is quite funny that the current CIs do not fail. I assume that they should kind of ravel the 1-element array. |
I believe so. +1 for separate the PR to make backporting the basic compat fix to 1.1.2 easier if needed. |
sklearn/utils/fixes.py
Outdated
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> a = np.array([[6, 8, 3, 0], | ||
... [3, 2, 1, 7], | ||
... [8, 1, 8, 4], | ||
... [5, 3, 0, 5], | ||
... [4, 7, 5, 9]]) | ||
>>> from sklearn.utils.fixes import mode | ||
>>> mode(a) | ||
ModeResult(mode=array([3, 1, 0, 0]), count=array([1, 1, 1, 1])) | ||
To get mode of whole array, specify ``axis=None``: | ||
>>> mode(a, axis=None) | ||
ModeResult(mode=3, count=3) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm no. The idea is that sklearn.utils.fixes.mode
will always behave like scipy>=1.9
. So we either import it or redefine it and reduce the dimension.
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
It should be ready for reviews. |
Note that this PR will need to be adapted depending on the result of the discussion with upstream: |
I put it WIP until we know what happens upstream. |
SciPy moved forward with a solution here: scipy/scipy#16429 |
Cool. I will remove the backport and just make the change in the code then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to adjust the code here?
mode, _ = stats.mode(_y[neigh_ind, k], axis=1) |
sklearn/impute/_base.py
Outdated
mode = stats.mode(array) | ||
most_frequent_value = mode[0][0] | ||
most_frequent_count = mode[1][0] | ||
most_frequent_value, most_frequent_count = stats.mode(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be specific, I think we need to work around SciPy's deprecation warning by placing this into utils.fixes
and adjust the code that uses mode.
def _mode(a, axis=0):
if sp_version >= parse_version("1.9.0"):
return stats.mode(a, keepdims=False)
# unpack for SciPy version < 1.9
results = stats.mode(a)
return results[0][0], results[1][0]
When our min supported SciPy version is 1.9 we can call stats.mode
directly with keepdims=False
.
We do not use mode in too many places:
sklearn/impute/_base.py: mode = stats.mode(array)
sklearn/neighbors/_classification.py: mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
sklearn/utils/tests/test_extmath.py: mode, score = stats.mode(x, axis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a PR to your fork with this idea: glemaitre#12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
The scipy warning is caused by scikit-learn internal usage of scipy. See scikit-learn/scikit-learn#23633
* Fix code export for new scikit-learn * Show warnings, but dont error, ignore scipy warning The scipy warning is caused by scikit-learn internal usage of scipy. See scikit-learn/scikit-learn#23633 * Explicitly add whiten to avoid deprecation warning * Cast array to list to avoid ambiguous comparison The previous statement was ambiguous as the `not in` operation could also interpreted to be used in element-wise fashion. * Allow to ignore terminals in search space for Individual.from_string This allows you to reconstruct an individual if additional hyperparameters have been added to the search space. * Add test for code export
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Related to #23626
stats.mode
cannot be unpacked as before with the latest SciPy.