RFC: Breaking Changes for Version 2

A while ago we talked about the possibility of a version 2 with breaking changes. Specifically, this came up in the context of SLEP006: metadata routing, but there are other things we have wanted to break which we can do all in a single release. In this issue we can talk about the possibility of such a release, and maybe a few things we'd like to include in it.

I'll try to summarize the challenges of keeping backward compatibility (BC) in the context of SLEP006:

- `MetaEstimator(Estimator()).fit(X, y, sample_weight)` will raise according to SLEP006 since `sample_weight` is not requested by any object. For backward compatibility, during the transition period, it'll only warn and assume `sample_weight` is requested by `Estimator()`.
- What should the behavior of `GridSearchCV(Estimator(), scorer=scorer).fit(X, y, sample_weight)` be during the transition period? If we keep backward compatibility, it should warn (whatever the warning be), route `sample_weight` to `Estimator`, but not to the `scorer`. And that's only the case because we're keeping BC. From the user's perspective, it's very weird.
- `Pipeline` ATM in certain methods like `fit`, `fit_transform`, `predict`, routes `*_params` to the `fit/etc` of the last estimator only. And `decision_function` and `transform`, `score_samples` have no routing at all. Keeping this BC, proves to be challenging, and a bit nasty. ref: #24270
- In quite a few meta-estimators, we check if the sub-estimator has `sample_weight` in the signature of `fit`, and we pass the given sample weight to it if that's the case, and not if that's not the case. With routing, we would have a much better idea of when to pass sample weight and when not, but if we're keeping BC, it's really challenging to see if we should forward sample weights or not. Both AdaBoost (#24026) and Bagging (#24250) have been very challenging, to the point that we might end up having to check if a sub-estimator is a meta-estimator itself or not, and check signature of sub-estimator's `fit`, and check the given routing info to keep BC.

I'm pretty sure there are other issues we haven't foreseen, and they'll come up as we work on this project. We thought it'd be easy once we get to this point, but BC has proven to make it very tricky and has stalled the progress.

All of the above is adding substantial complexity which is probably error prone, and buggy. We have had to spend quite a bit of time every time we go back to review those bits to understand why we've done what we've done. Even as a temporary solution they're not very maintainable.

My proposal here is to have a version 2 which is not BC with previous releases in certain ways, and target a few breaking changes for it. We can aim to have the release in October 2023 or April/May 2024 as the version 2.

cc @scikit-learn/core-devs 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Breaking Changes for Version 2 #25776

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Breaking Changes for Version 2 #25776

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions