-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT make AdditiveChi2Sampler
stateless and check that stateless Transformers
don't raise NotFittedError
#25190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT make AdditiveChi2Sampler
stateless and check that stateless Transformers
don't raise NotFittedError
#25190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of remarks.
@glemaitre, do we need to edit the Changelog for this one? |
Yes, we need to document the deprecation to the end-user ;) |
And shall we add the common test for stateless estimators in another PR or in this one? |
The common test is already prepared there: #25223 Awaiting for a second approval to be merged. |
Thanks for pointing this out. Interestingly, this PR doesn't leverage the stateless flag but uses |
Oh right, my bad. I was confused about the end goal of the PR. We don't have a common test yet for the stateless part. We can make the common test here and add a whitelist also for the estimator that needs to be fixed. Sorry about that. |
sklearn/tests/test_common.py
Outdated
@pytest.mark.parametrize( | ||
"estimator", STATELESS_ESTIMATORS, ids=_get_check_estimator_ids | ||
) | ||
def test_no_fitted_error_stateless_estimator(estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write this test differently. It should be a check
in the estimator
check and should be added in _yield_all
with a if "stateless" is tags
(or something equivalent).
sklearn/tests/test_common.py
Outdated
X, _ = make_blobs(n_samples=80, n_features=4, random_state=0) | ||
X = X[X > 0].reshape(-1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check in the other check
, we usually call some helper to transform X
and y
depending on the expectation of the estimator.
sklearn/tests/test_common.py
Outdated
if hasattr(estimator, "transform"): | ||
# Does not raise | ||
estimator.transform(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that we expect a transformer now. So either we add it to the _yield
for transformer or we think that estimator like DummyClassifier
or DummyRegressor
could become somehow stateless (with some parameter combination) in which case we want to try both transform
and predict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we want to check the length of the transform to check that we still have n_samples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these detailed explanations. Regarding your option number 2, wouldn't adding predict
raise code coverage concerns since this line won't be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with keeping it for transformers for the moment. If we tackle "predictors" then we will make the change at this moment.
sklearn/utils/estimator_checks.py
Outdated
X = _enforce_estimator_tags_X(transformer, X) | ||
|
||
transformer = clone(transformer) | ||
# Should not raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can safely remove this comment :)
sklearn/utils/estimator_checks.py
Outdated
@ignore_warnings(category=FutureWarning) | ||
def check_transformers_unfitted_stateless(name, transformer): | ||
"""Check that using transform or predict without prior fitting | ||
doesn't raise a NotFittedError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't raise a NotFittedError. | |
doesn't raise a NotFittedError for stateless transformers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extending tests cases to check compliant behavior of stateless estimator and making AdditiveChi2Sampler
stateless, @Vincent-Maladiere.
Here is a first review.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
AdditiveChi2Sampler
statelessAdditiveChi2Sampler
stateless and check that stateless Transformers
don't raise NotFittedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @Vincent-Maladiere.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…Transformers` don't raise `NotFittedError` (scikit-learn#25190) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…Transformers` don't raise `NotFittedError` (scikit-learn#25190) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Addresses the
AdditiveChi2Sampler
issue of the drafting meeting 12/2 and #12616What does this implement/fix? Explain your changes.
AdditiveChi2Sampler
stateless by:NotFittedError
in thetransform
methodsample_interval
fromfit
totransform
sample_interval_
anymoreAny other comments?
Todo next: create a subsequent PR to add a common test to check that stateless estimators don't raise
NotFittedError
whentransform
is called without a prior call tofit
.cc @glemaitre