Skip to content

ENH Uses __sklearn_tags__ for tags instead of mro walking #22606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #20804

What does this implement/fix? Explain your changes.

This PR implements __sklearn_tags__, while also keeping backward compatibility. The idea is to not walk the MRO anymore and use Python inheritance to get the tags. This means third party estimators needs to call super().__sklearn_tags__, create a new dictionary and return it.

Any other comments?

I suspect the current design was to allow third party developers to define _more_tags without the complete set of tags. _safe_tags will infer the missing tags with the default ones. If we want to support this use case, then __sklearn_tags__ can also return a subset of the tags, and we have _safe_tags infer the missing ones with the defaults.

@amueller
Copy link
Member

looks good. do we want to do this still? And does it need a deprecation cycle?

@adrinjalali
Copy link
Member

I think we should do it, looks much better as a "developer API" kinda thing.

@thomasjpfan thomasjpfan marked this pull request as draft October 26, 2022 15:55
Copy link

github-actions bot commented Apr 29, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 813a8c4. Link to the linter CI: here

@adrinjalali
Copy link
Member

@glemaitre @thomasjpfan while working on having tags as dataclasses, I realized having the MRO walk is painful since we update dictionaries and we should instead update instances of dataclasses. That means having this in before the dataclasses makes a lot more sense. So I'd suggest we merge this, and then merge the other PR (which I'll open basing on this PR) cleaning up our tags.

I've updated this PR for 1.6 release.

@glemaitre glemaitre self-requested a review August 13, 2024 08:26
@adrinjalali
Copy link
Member

One thing I'd change here is the deprecation. If we're going to change tags, and the way they're represented, we can simply introduce the new method, and avoid a deprecation cycle. For third party estimators, if they want to support multiple sklearn versions, they can simply implement _more_tags as well as __sklearn_tags__

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, with:

  • would remove the deprecation in the other PR when revamping tags
  • the diff of the other PR wouldn't be much mess with this PR merged, so we could also just work on that (PR coming).

@adrinjalali adrinjalali removed the Needs Decision Requires decision label Aug 14, 2024
more_tags = base_class._more_tags(self)
collected_tags.update(more_tags)
return collected_tags
def __sklearn_tags__(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Removing _get_tags is a breaking change if developers are using it. Looking through Github for self._get_tags(, there are some scikit-learn related use cases for _get_tags but it's not too much.

@glemaitre Imbalanced-learn uses _get_tags in tests: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/2b6269f9aaea5f058606bf318b8bc36150137dd6/imblearn/utils/estimator_checks.py#L101

Given this is a private API, I am okay with breaking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Revisting the tags interface
4 participants