Skip to content

FIX MultiOutput* when sub-estimator does not accept metadata #28240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 6, 2024

Conversation

adrinjalali
Copy link
Member

Fixes #28239

And adds a common test to check the same issue with all other meta-estimators in the common test file.

cc @glemaitre @OmarManzoor

@adrinjalali adrinjalali added this to the 1.4.1 milestone Jan 24, 2024
Copy link

github-actions bot commented Jan 24, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 035441f. Link to the linter CI: here

@glemaitre glemaitre self-requested a review January 24, 2024 16:12
@glemaitre
Copy link
Member

In process_routing we say:

    Assuming this signature: ``fit(self, X, y, sample_weight=None, **fit_params)``,
    a call to this function would be:
    ``process_routing(self, sample_weight=sample_weight, **fit_params)``.

It looks incorrect then because we should not pass sample_weight when they are None.

@adrinjalali
Copy link
Member Author

Would we want to enforce this contract?

Only non-None metadata are routed. All metadata whose value is None, will be ignored.

I think it would be very odd for None to actually be a non-default value which has a side effect.

@glemaitre glemaitre self-requested a review February 1, 2024 17:26
@adrinjalali
Copy link
Member Author

Oh, I see why CalibratedClassifierCV doesn't raise the same error. Here:

extra_keys = set(params.keys()) - param_names - self_params

we only raise if anything is passed but is not either requested, or is a self accepted metadata. Therefore it doesn't matter if we pass sample_weight there or no, there will be no errors anyway. And passing it as None is of no consequence since it's not gonna be included in any routed param set, since it's not requested by the children. So the code is good as it is.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adrinjalali

@StefanieSenger
Copy link
Contributor

I think there should also be tests for any meta-estimator, like Pipeline, that takes several sub-estimators or sub-transformers, that might or might not support that metadata.

@adrinjalali
Copy link
Member Author

I think there should also be tests for any meta-estimator, like Pipeline, that takes several sub-estimators or sub-transformers, that might or might not support that metadata.

I think those are tested in the common metadata routing tests @StefanieSenger

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Feb 5, 2024

@adrinjalali
I meant to add tests if Pipeline and other meta-estimators, that accept lists of several sub-estimators, will also accept a mix of consumers and non-consumers, for instance in test_pipeline.py.
Those newly added tests here in test_metaestimators_metadata_routing.py don't do that.

@adrinjalali
Copy link
Member Author

Yes, but I think those are already tested in the test_metadata_routing.py. We haven't had an issue yet which comes from having a mix of subestimators that are consumers and non-consumers.

@adrinjalali
Copy link
Member Author

cc @OmarManzoor @thomasjpfan for a quick review maybe? This is for the coming patch release.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adrinjalali

@OmarManzoor OmarManzoor merged commit 7d8a9de into scikit-learn:main Feb 6, 2024
@adrinjalali adrinjalali deleted the fix/multioutput branch February 6, 2024 14:18
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata routing breaks MultioutputClassifier with estimator that doesn't support sample_weight in fit.
4 participants