-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Uses __sklearn_tags__ for tags instead of mro walking #22606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
looks good. do we want to do this still? And does it need a deprecation cycle? |
I think we should do it, looks much better as a "developer API" kinda thing. |
@glemaitre @thomasjpfan while working on having tags as dataclasses, I realized having the MRO walk is painful since we update dictionaries and we should instead update instances of dataclasses. That means having this in before the dataclasses makes a lot more sense. So I'd suggest we merge this, and then merge the other PR (which I'll open basing on this PR) cleaning up our tags. I've updated this PR for 1.6 release. |
One thing I'd change here is the deprecation. If we're going to change tags, and the way they're represented, we can simply introduce the new method, and avoid a deprecation cycle. For third party estimators, if they want to support multiple sklearn versions, they can simply implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, with:
- would remove the deprecation in the other PR when revamping tags
- the diff of the other PR wouldn't be much mess with this PR merged, so we could also just work on that (PR coming).
more_tags = base_class._more_tags(self) | ||
collected_tags.update(more_tags) | ||
return collected_tags | ||
def __sklearn_tags__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali Removing _get_tags
is a breaking change if developers are using it. Looking through Github for self._get_tags(, there are some scikit-learn related use cases for _get_tags
but it's not too much.
@glemaitre Imbalanced-learn uses _get_tags
in tests: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/2b6269f9aaea5f058606bf318b8bc36150137dd6/imblearn/utils/estimator_checks.py#L101
Given this is a private API, I am okay with breaking it.
Reference Issues/PRs
Fixes #20804
What does this implement/fix? Explain your changes.
This PR implements
__sklearn_tags__
, while also keeping backward compatibility. The idea is to not walk the MRO anymore and use Python inheritance to get the tags. This means third party estimators needs to callsuper().__sklearn_tags__
, create a new dictionary and return it.Any other comments?
I suspect the current design was to allow third party developers to define
_more_tags
without the complete set of tags._safe_tags
will infer the missing tags with the default ones. If we want to support this use case, then__sklearn_tags__
can also return a subset of the tags, and we have_safe_tags
infer the missing ones with the defaults.