Skip to content

RFC: Breaking Changes for Version 2 #25776

@adrinjalali

Description

@adrinjalali

A while ago we talked about the possibility of a version 2 with breaking changes. Specifically, this came up in the context of SLEP006: metadata routing, but there are other things we have wanted to break which we can do all in a single release. In this issue we can talk about the possibility of such a release, and maybe a few things we'd like to include in it.

I'll try to summarize the challenges of keeping backward compatibility (BC) in the context of SLEP006:

  • MetaEstimator(Estimator()).fit(X, y, sample_weight) will raise according to SLEP006 since sample_weight is not requested by any object. For backward compatibility, during the transition period, it'll only warn and assume sample_weight is requested by Estimator().
  • What should the behavior of GridSearchCV(Estimator(), scorer=scorer).fit(X, y, sample_weight) be during the transition period? If we keep backward compatibility, it should warn (whatever the warning be), route sample_weight to Estimator, but not to the scorer. And that's only the case because we're keeping BC. From the user's perspective, it's very weird.
  • Pipeline ATM in certain methods like fit, fit_transform, predict, routes *_params to the fit/etc of the last estimator only. And decision_function and transform, score_samples have no routing at all. Keeping this BC, proves to be challenging, and a bit nasty. ref: ENH SLEP006: add routing to Pipeline #24270
  • In quite a few meta-estimators, we check if the sub-estimator has sample_weight in the signature of fit, and we pass the given sample weight to it if that's the case, and not if that's not the case. With routing, we would have a much better idea of when to pass sample weight and when not, but if we're keeping BC, it's really challenging to see if we should forward sample weights or not. Both AdaBoost (FEAT add routing to AdaBoost's fit #24026) and Bagging (SLEP006: Metadata routing for bagging #24250) have been very challenging, to the point that we might end up having to check if a sub-estimator is a meta-estimator itself or not, and check signature of sub-estimator's fit, and check the given routing info to keep BC.

I'm pretty sure there are other issues we haven't foreseen, and they'll come up as we work on this project. We thought it'd be easy once we get to this point, but BC has proven to make it very tricky and has stalled the progress.

All of the above is adding substantial complexity which is probably error prone, and buggy. We have had to spend quite a bit of time every time we go back to review those bits to understand why we've done what we've done. Even as a temporary solution they're not very maintainable.

My proposal here is to have a version 2 which is not BC with previous releases in certain ways, and target a few breaking changes for it. We can aim to have the release in October 2023 or April/May 2024 as the version 2.

cc @scikit-learn/core-devs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Breaking ChangeIssue resolution would not be easily handled by the usual deprecation cycle.RFC

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions