-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Regression in SelectorMixin in 1.6.0rc1 #30324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So I acknowledge that we have some regression because we change some code that we considered experimental and private and we actually broke some user code. You code is slightly broken because class MyEstimator(SelectorMixin, BaseEstimator):
def fit(self, X, y=None):
self.n_features_in_ = len(X)
return self So reason is that the tag resolution silently ignore So I'm currently working on #30327 that should improve the situation. So you would get something like: /Users/glemaitre/Documents/packages/src/scikit-learn/scikit-learn/sklearn/utils/_tags.py:346: FutureWarning: The MyEstimator or classes from which it inherits use `_get_tags` and `_more_tags`. Please define the `__sklearn_tags__` method, or inherit from `sklearn.base.BaseEstimator` and/or other appropriate mixins such as `sklearn.base.TransformerMixin`, `sklearn.base.ClassifierMixin`, `sklearn.base.RegressorMixin`, and `sklearn.base.OutlierMixin`. From scikit-learn 1.7, not defining `__sklearn_tags__` will raise an error.
warnings.warn(
Out[11]:
array([[ 5., 7.],
[nan, 9.]]) So the warning would request a change in the future to: from sklearn.feature_selection import SelectorMixin
from sklearn.base import BaseEstimator
from sklearn.utils.validation import validate_data
class MyEstimator(SelectorMixin, BaseEstimator):
def fit(self, X, y=None):
validate_data(self, X=X, ensure_all_finte="allow-nan")
return self
def _get_support_mask(self):
mask = np.ones(self.n_features_in_, dtype=bool)
return mask
def __sklearn_tags__(self):
tags = super().__sklearn_tags__()
tags.input_tags.allow_nan = True
return tags So to summarize, we are introducing some backward compatibility even on some private functions. Changing the code should be quite straightforward but it could bring some burden if a third-party library wants to support several version of scikit-learn. In this regard, I'm working on the following right now: The idea would be to import for instance the Happy to get some more inputs here such that we don't overlook at things while moving forward with some internal design in scikit-learn. |
Great, I can confirm that this also fixes the original issue in my (company's) package. I'll keep an eye on your Thanks! |
Describe the bug
Using the estimator tag
allow_nan
doesn't work withSelectorMixin
in the release candidate.A first skim suggests maybe
ensure_all_finite
is inconsistently expected to beFalse
and other times"allow-nan"
? In particular at https://github.com/scikit-learn/scikit-learn/blame/439ea045ad44e6a09115dc23e9bf23db00ff41de/sklearn/utils/validation.py#L1110 ?Steps/Code to Reproduce
Expected Results
No error is thrown, and the numpy array is returned unchanged.
Actual Results
Versions
The text was updated successfully, but these errors were encountered: