Skip to content

FIX Avoid fitting a pipeline without steps #31723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 14, 2025

Conversation

DeaMariaLeon
Copy link
Contributor

@DeaMariaLeon DeaMariaLeon commented Jul 8, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Trying to fix #31700

Any other comments?

I wanted add the check to __sklearn_is_fitted__ as well. That way we avoid an empty Pipeline to be displayed as fitted (in an HTML display), and we have a clear message that it needs more info. But this may be incorrect.

If the empty Pipeline display needs to be kept (but shown as "not fitted"), then estimator.py needs love.

Friendly ping @betatim @glemaitre

Copy link

github-actions bot commented Jul 8, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 92bd2a6. Link to the linter CI: here

@@ -1289,7 +1291,6 @@ def __sklearn_is_fitted__(self):

An empty pipeline is considered fitted.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where I wanted to to check if the pipeline had steps. But this means the display will fail as well.
And I just saw this: "An empty pipeline is considered fitted." This means that the HTML display is actually correct.

@jeremiedbb
Copy link
Member

I think we need to be careful before requiring non-empty steps to a Pipeline. We never know how users use the estimators. For instance you can create an empty pipeline and set the steps afterwards using set_params.

That being said, checking for non-empty steps in _validate_steps seems legit since

  • it currently fails with a non-informative error message
  • it's a fit time only check
    But maybe we'd like it to work instead of failing, a bit like if it was an implicit "passthrough" ? Because I find the following a bit odd:
Pipeline(steps=[]).fit(X,y)  # fails
Pipeline(steps=[("a", "passthrough")]).fit(X,y)  # works

I'd like to have the opinion of @glemaitre, @adrinjalali, or @ogrisel before making a decision here.

@ogrisel
Copy link
Member

ogrisel commented Jul 9, 2025

I think the "empty is passthrough' semantics feel quite natural.

@adrinjalali
Copy link
Member

Since right now fitting an empty pipeline fails anyway, I'm happy with this PR since it makes the error message much more user friendly and doesn't really change any behavior.

@betatim
Copy link
Member

betatim commented Jul 9, 2025

Right now you can't use a Pipeline without steps, so there isn't anyone out there that relies on this. This means we can change the behaviour of __sklearn_is_fitted__ to indicate that it is not fitted and raise a nicer error.

In a new PR we can, if we decide to do it, start allowing empty pipelines.

@DeaMariaLeon
Copy link
Contributor Author

Right know, this works in jupyter:

pipe = Pipeline([])
      
pipe # to visualise it

If we change the behaviour of __sklearn_is_fitted__ (by raising), the visualisation won't work.
Is that OK @betatim? It seems to me that it contradicts what Jérémy said above:

We never know how users use the estimators. For instance you can create an empty pipeline and set the steps afterwards using set_params.

@DeaMariaLeon DeaMariaLeon marked this pull request as ready for review July 10, 2025 08:09
@betatim
Copy link
Member

betatim commented Jul 10, 2025

I am not sure I understand why we should raise an exception from __sklearn_is_fitted__? Can't we return False instead?

@DeaMariaLeon
Copy link
Contributor Author

I am not sure I understand why we should raise an exception from sklearn_is_fitted?

I thought that it was what you meant (in the issue). 🫣

@jeremiedbb
Copy link
Member

I don't think __sklearn_is_fitted__ has to be modified. We currently consider a pipeline containing only "passthrough" steps to be fitted. If later we decide to make pipelines with no steps equivalent to an implicit passthrough, then it should be considered fitted for consistency.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Not sure it requires a changelog entry since it's just a more informative error message. I let you decide

@DeaMariaLeon
Copy link
Contributor Author

I let you decide

I don't need one. But maybe you were asking the others?

@adrinjalali adrinjalali enabled auto-merge (squash) July 14, 2025 13:02
@adrinjalali adrinjalali merged commit e4b0849 into scikit-learn:main Jul 14, 2025
39 of 40 checks passed
@DeaMariaLeon DeaMariaLeon deleted the pipeline branch July 15, 2025 07:37
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 15, 2025
@jeremiedbb jeremiedbb mentioned this pull request Jul 15, 2025
13 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipelines are permitted to have no steps and are displayed as fitted
7 participants