Skip to content

MAINT make AdditiveChi2Sampler stateless and check that stateless Transformers don't raise NotFittedError #25190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Vincent-Maladiere
Copy link
Contributor

Reference Issues/PRs

Addresses the AdditiveChi2Sampler issue of the drafting meeting 12/2 and #12616

What does this implement/fix? Explain your changes.

  • Make AdditiveChi2Sampler stateless by:
    • removing the NotFittedError in the transform method
    • moving the computation logic of sample_interval from fit to transform
    • not storing sample_interval_ anymore
  • Edit the tests to reflect that change

Any other comments?

Todo next: create a subsequent PR to add a common test to check that stateless estimators don't raise NotFittedError when transform is called without a prior call to fit.
cc @glemaitre

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of remarks.

@Vincent-Maladiere
Copy link
Contributor Author

@glemaitre, do we need to edit the Changelog for this one?

@glemaitre
Copy link
Member

Yes, we need to document the deprecation to the end-user ;)

@Vincent-Maladiere
Copy link
Contributor Author

And shall we add the common test for stateless estimators in another PR or in this one?

@glemaitre
Copy link
Member

The common test is already prepared there: #25223

Awaiting for a second approval to be merged.

@Vincent-Maladiere
Copy link
Contributor Author

Thanks for pointing this out. Interestingly, this PR doesn't leverage the stateless flag but uses get_feature_names_out as a proxy if I understand it correctly.

@glemaitre
Copy link
Member

Oh right, my bad. I was confused about the end goal of the PR. We don't have a common test yet for the stateless part.

We can make the common test here and add a whitelist also for the estimator that needs to be fixed. Sorry about that.

@pytest.mark.parametrize(
"estimator", STATELESS_ESTIMATORS, ids=_get_check_estimator_ids
)
def test_no_fitted_error_stateless_estimator(estimator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this test differently. It should be a check in the estimator check and should be added in _yield_all with a if "stateless" is tags (or something equivalent).

Comment on lines 662 to 663
X, _ = make_blobs(n_samples=80, n_features=4, random_state=0)
X = X[X > 0].reshape(-1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check in the other check, we usually call some helper to transform X and y depending on the expectation of the estimator.

Comment on lines 665 to 667
if hasattr(estimator, "transform"):
# Does not raise
estimator.transform(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that we expect a transformer now. So either we add it to the _yield for transformer or we think that estimator like DummyClassifier or DummyRegressor could become somehow stateless (with some parameter combination) in which case we want to try both transform and predict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we want to check the length of the transform to check that we still have n_samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these detailed explanations. Regarding your option number 2, wouldn't adding predict raise code coverage concerns since this line won't be tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with keeping it for transformers for the moment. If we tackle "predictors" then we will make the change at this moment.

X = _enforce_estimator_tags_X(transformer, X)

transformer = clone(transformer)
# Should not raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can safely remove this comment :)

@ignore_warnings(category=FutureWarning)
def check_transformers_unfitted_stateless(name, transformer):
"""Check that using transform or predict without prior fitting
doesn't raise a NotFittedError.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doesn't raise a NotFittedError.
doesn't raise a NotFittedError for stateless transformers.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending tests cases to check compliant behavior of stateless estimator and making AdditiveChi2Sampler stateless, @Vincent-Maladiere.

Here is a first review.

@Vincent-Maladiere Vincent-Maladiere changed the title MAINT make AdditiveChi2Sampler stateless MAINT make AdditiveChi2Sampler stateless and check that stateless Transformers don't raise NotFittedError Jan 17, 2023
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @Vincent-Maladiere.

@jjerphan jjerphan enabled auto-merge (squash) January 18, 2023 16:24
@jjerphan jjerphan merged commit e010e4f into scikit-learn:main Jan 18, 2023
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…Transformers` don't raise `NotFittedError` (scikit-learn#25190)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…Transformers` don't raise `NotFittedError` (scikit-learn#25190)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants