Skip to content

[WIP] Add a supports_sample_weight tag #13565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

@adrinjalali
Copy link
Member

I would have though there are more estimators which support sample_weights than not (and hence set the default value of the parameter to True).

@NicolasHug
Copy link
Member Author

From this snippet I tried I observed the reverse:

from sklearn.utils.testing import all_estimators
from sklearn.utils.validation import has_fit_parameter

has_sw = 0
tot = 0
for _, Est in all_estimators():
    if hasattr(Est, 'fit'):
        if has_fit_parameter(Est, 'sample_weight'):
            has_sw += 1
        tot += 1
print(tot)
print(has_sw)
192
56

Also it makes sense to me to play it safe and let the default be false

@NicolasHug
Copy link
Member Author

NicolasHug commented Apr 5, 2019

I'm holding off on this for a bit.

There are many complications to using estimator tags inside the code of the estimators (for now all is fine since they're only used in the estimator checks, but not anymore in this PR).

A few issues:

  • if A inherits from B and some_tag_of_A != some_tag_of_B we raise an error in update_if_consistent. That means, e.g., that if I inherit from LinearRegression which supports sample_weight, I need to support sample_weight too.
  • more generally, if any estimator in the MRO of A has inconsistent tags, an error is raised. It's not clear why this is really useful in the case of supports_sample_weight
  • if tags are to be used in the code, we need deprecation cycle / warnings to indicate developers that they need to set the tags on their own estimators. For example, changing if has_fit_parameter('sample_weight') to if self._get_tags()['supports_sample_weight'] breaks people's code as of now.

Of course it should still be possible to introduce the tag and only use it in the estimator checks.

EDIT: point 1 and 2 have been resolved now and child classes can override parents tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants