FIX pipeline now checks if it's fitted #29868

adrinjalali · 2024-09-17T12:26:11Z

Fixes #27014

This PR makes Pipeline to check if it's fitted in methods other than fit*, with a deprecation.

cc @glemaitre @betatim @StefanieSenger

github-actions · 2024-09-17T12:27:34Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 6094f5a. Link to the linter CI: here}

glemaitre · 2024-09-17T16:15:37Z

sklearn/pipeline.py

@@ -37,6 +38,18 @@
 __all__ = ["Pipeline", "FeatureUnion", "make_pipeline", "make_union"]


+def _check_is_fitted(pipeline):
+    try:
+        check_is_fitted(pipeline)


I'm wondering if we should not also modify check_is_fitted to be lenient with stateless estimator. Right now, one could expect to implement __sklearn_is_fitted__ but I don't think this is part of the API per se. So I'm wondering if check_is_fitted should look at the tag requires_fit?

Done via: #29880

glemaitre

A couple of comments. But it looks good.

sklearn/pipeline.py

glemaitre · 2024-09-28T11:22:05Z

sklearn/pipeline.py

@@ -575,18 +603,21 @@ def predict(self, X, **params):
        y_pred : ndarray
            Result of calling `predict` on the final estimator.
        """
-        Xt = X
+        with _handle_warnings(self):


It might be worth to also add some TODO next to each context manager to have more occurences.

Maybe having a decorator instead of a context manager would avoid the extra indentation?

I would also like it to be a decorator, since this is concerning whole methods.

these methods are complicated, they already have decorators, and adding another decorator might complicate things. So I rather do the dirty-ish thing here and keep it as is.

sklearn/pipeline.py

sklearn/tests/test_pipeline.py

StefanieSenger

Thanks, @adrinjalali, I have worked through this and added some comments, which I hope will be helpful.

StefanieSenger · 2024-10-04T10:57:41Z

sklearn/pipeline.py

@@ -575,18 +603,21 @@ def predict(self, X, **params):
        y_pred : ndarray
            Result of calling `predict` on the final estimator.
        """
-        Xt = X
+        with _handle_warnings(self):


I would also like it to be a decorator, since this is concerning whole methods.

sklearn/tests/test_pipeline.py

StefanieSenger · 2024-10-04T11:19:27Z

sklearn/tests/test_pipeline.py

+    def __sklearn_is_fitted__(self):
+        return True


What is the purpose of distributing self.fitted_ = True and def __sklearn_is_fitted__(self) across some of the mocking classes in this test file and in model_selection/test/test_validation?

I believe this is not needed and I think it's blurring the boundaries between the test cases and makes them difficult to read without knowing this PR or searching for it in the future.

I find it neater if test classes are very cleanly only serving their own purpose.

This is added since w/o it the test would fail.

Strictly speaking, adding a self.fitted_ = True is used when you want check_is_fitted to be okay after calling fit, but you don't have anything else to set in fit. __sklearn_check_is_fitted__ is added when you don't need the user to call fit and the estimator is always considered fitted.

Now this PR is merged, but still: I had run all the concerned tests files without the additions before making this comment. They all passed.

No they fail:

FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[decision_function] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[inverse_transform] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[predict] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[predict_log_proba] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[predict_proba] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[score] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning. FAILED sklearn/tests/test_pipeline.py::test_metadata_routing_for_pipeline[transform] - FutureWarning: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using other methods such as transform, predict, etc. This will raise an error in 1.8 instead of the current warning.

Running the tests with the -Werror::FutureWarning flag or the SKLEARN_WARNINGS_AS_ERRORS=1 environmental variable shows the errors.
Thanks for the hint, @adrinjalali.

StefanieSenger · 2024-10-04T12:38:30Z

sklearn/tests/test_pipeline.py

+# TODO(1.8): remove this test
+def test_pipeline_warns_not_fitted():
+    class StatelessEstimator(BaseEstimator):
+        def fit(self, X, y):


Here mentioning explicitly what is lacking:

Suggested change

def fit(self, X, y):

def fit(self, X, y):

"""Doesn't create learned attributes."""

the docstring now does enough explanation I think.

sklearn/pipeline.py

sklearn/tests/test_pipeline.py

StefanieSenger · 2024-10-04T13:27:26Z

sklearn/pipeline.py

+        for _, estimator in reversed(self.steps):
+            if estimator != "passthrough":
+                last_step = estimator
+                break


I thought using break is considered bad practice, isn't it? Not sure about its actual downsides though. Alternatively a while loop with "last step that is not 'passthrough'" as a stopping criterion, but that would look very complicated compared to the break.

there's nothing wrong with using break.

One alternative might be something like

last_step = next( ( estimator for _, estimator in reversed(self.steps) if estimator != "passthrough" ), None, )

but not everyone thinks this is cleaner.

yeah that's hard to read 😁 but nice!

sklearn/pipeline.py

StefanieSenger · 2024-10-04T14:06:30Z

sklearn/pipeline.py

+    """A context manager to make sure a NotFittedError is raised, if a subestimator
+    raises the error.
+
+    Otherwise, we raise a warning if the pipeline is not fitted, with the deprecation.


This context manager raises a warning instead of a NotFittedError, which differs from what is written here.

Maybe like this:

Suggested change

"""A context manager to make sure a NotFittedError is raised, if a subestimator

raises the error.

Otherwise, we raise a warning if the pipeline is not fitted, with the deprecation.

"""A context manager to raise a FutureWarning during the deprecation period,

if the last step of a pipeline raises a NotFittedError when it is not fitted.

Now I see I got this wrong.
Would still be good to explain better what is supposed to happen.

I think for a helper method which is only here for two versions during the deprecation cycle, it doesn't really matter.

StefanieSenger · 2024-10-04T14:24:21Z

sklearn/model_selection/tests/test_validation.py

@@ -253,6 +253,7 @@ def fit(
                P.shape[0],
                P.shape[1],
            )
+        self.fitted_ = True


Why do we need it here?

cause check_is_fitted checks for an attribute with a trailing underscore. This makes check_is_fitted(self) to pass.

glemaitre

LGTM

adrinjalali · 2024-10-17T08:00:41Z

@Charlie-XIAO or @adam2392 might wanna have a look?

Charlie-XIAO

This overall LGTM! Just a small suggestion:

Charlie-XIAO · 2024-10-17T12:08:23Z

sklearn/pipeline.py

@@ -37,6 +39,33 @@
 __all__ = ["Pipeline", "FeatureUnion", "make_pipeline", "make_union"]


+@contextmanager
+def _handle_warnings(estimator):


I was confused by this name before reading its docstring. Would something like _raise_or_warn_if_not_fitted, or _ensure_fitted_or_warn, or _handle_fit_status be better?

FIX pipeline now checks if it's fitted

78bac46

github-actions bot added the module:pipeline label Sep 17, 2024

add changelog

e860df5

adrinjalali added this to the 1.6 milestone Sep 17, 2024

glemaitre reviewed Sep 17, 2024

View reviewed changes

adrinjalali added 4 commits September 19, 2024 14:51

make Pipeline raise if the subestimator raises

ff5e389

Merge remote-tracking branch 'upstream/main' into pipeline/fit

d37182e

fix sphinx warning

b15cbe9

Merge remote-tracking branch 'upstream/main' into pipeline/fit

2adc9b1

glemaitre self-requested a review September 28, 2024 11:17

glemaitre reviewed Sep 28, 2024

View reviewed changes

StefanieSenger reviewed Oct 4, 2024

View reviewed changes

adrinjalali added 3 commits October 15, 2024 13:34

address reviews

497252d

Merge remote-tracking branch 'upstream/main' into pipeline/fit

08fb7dd

raise from exc

9ae68e7

glemaitre approved these changes Oct 17, 2024

View reviewed changes

adrinjalali added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 17, 2024

Charlie-XIAO approved these changes Oct 17, 2024

View reviewed changes

adrinjalali added 2 commits October 17, 2024 14:34

rename func

10c5c7b

Merge remote-tracking branch 'upstream/main' into pipeline/fit

6094f5a

adrinjalali added No Changelog Needed and removed Waiting for Second Reviewer First reviewer is done, need a second one! labels Oct 17, 2024

adrinjalali enabled auto-merge (squash) October 17, 2024 12:40

adrinjalali merged commit 4dfbfb9 into scikit-learn:main Oct 17, 2024
32 of 33 checks passed

adrinjalali deleted the pipeline/fit branch October 18, 2024 09:47

	def fit(self, X, y):
	def fit(self, X, y):
	"""Doesn't create learned attributes."""

Uh oh!

FIX pipeline now checks if it's fitted #29868

FIX pipeline now checks if it's fitted #29868

Uh oh!

Conversation

adrinjalali commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StefanieSenger Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Sep 17, 2024 •

edited

Loading

github-actions bot commented Sep 17, 2024 •

edited

Loading

StefanieSenger Oct 4, 2024 •

edited

Loading

StefanieSenger Oct 18, 2024 •

edited

Loading