-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Binary only estimator checks for classification #13875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jnothman got any thoughts on this one? |
This is worth supporting, but I'm a bit concerned about the implementation. Everytime a check is added, we need to remember the existence of this tag and generate the data accordingly. Not sure what would be a better alternative though :/ |
The test I added would fail if the data is not generated to support the tag @NicolasHug so any new estimator checks would fail if they don't cover the binary case |
I could add some comments explaining what it is for to the test case or the binary DTC estimator if you think that would help? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed the check, sorry. It's pretty clear what it does ;)
I commented a few nits, but since this doesn't seem to introduce much changes I'm OK with it.
Thanks for the review @NicolasHug , I'll make suggested changes tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm wondering whether we want to add a check that ensures an exception is raised when this tag is used on an estimator that supports milti-class classification (to make sure this tag is not misused). Though I think we don't do this for other tags, so it is probably not necessary.
@rth I think it would look kinda strange if someone tagged an estimator as binary only when it isn't ... But I can add a check if you want. So, check if the tag is there, and then assert an exception is raised when fitting multi-class if it is present? Would a specific type of exception be expected? |
@NicolasHug I have addressed your comments. The Windows failure doesn't appear to be related to these changes. |
@rth @NicolasHug .. bump :-) Looks like that test failure was solved by another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small suggestion but LGTM.
Anything else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @trevorstephens !
Cheers @rth 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh. I wrote these comments but apparently forgot to submit them. This also lacks a what's new entry
@@ -1550,6 +1550,10 @@ poor_score | |||
multioutput_only | |||
whether estimator supports only multi-output classification or regression. | |||
|
|||
binary_only | |||
whether estimator supports binary classification but lacks multi-class | |||
classification support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it may still support multilabel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess just following the tag logic this is true since an estimator has to "opt in" to multilabel. I don't think there are any multilabel tests though 😕
I was wondering about that but estimator tags is an experimental and technically private feature, that may see some evolution in the near future (e.g. #14069) do we really want to write what's new each time we change something there? Also it updates the docs, so I thought it might be sufficient. @jnothman For your other comments, I agree. I can push a fix to master (including a what's new if you still think it's useful). |
I presume Trevor wants to use this feature, so I'd think it's worth
reporting in what's new.
|
Yes, I'll add this feature it to my package after the next release @jnothman ... I didn't think the change was major enough to warrant a what's new, but happy to add. I'm fine to do another PR, or @rth can use super-powers to do it faster I'm sure :-) I'm not sure about your tags comment though. That pattern is all over this file, see my comments above. |
OK so the patch with the what's new is, diff --git a/doc/whats_new/v0.22.rst b/doc/whats_new/v0.22.rst
index e998294e6..1be125e3c 100644
--- a/doc/whats_new/v0.22.rst
+++ b/doc/whats_new/v0.22.rst
@@ -113,3 +113,7 @@ These changes mostly affect library developers.
``transform`` is called before ``fit``; previously an ``AttributeError`` or
``ValueError`` was acceptable.
:pr:`13013` by by :user:`Agamemnon Krasoulis <agamemnonc>`.
+
+- |Enhancement| Binary only classifiers are now supported in estimator checks.
+ Such classifiers need to have the `binary_only=True` estimator tag.
+ :pr:`13875` by `Trevor Stephens`_. @trevorstephens does that work for you? Or if you want to change anything, probably better to make a PR indeed :) +1 for keeping the doc as it, since that's indeed the general logic in that file. |
Sure @rth 👍 |
awesome! Thanks! |
Also adds the binary_only estimator tag
Reference Issues/PRs
See also #6715 (comment)
What does this implement/fix? Explain your changes.
I don't think this closes the reference issue entirely, but might help it out in some cases such as mine. I maintain
gplearn
, a niche package that implements genetic programming with a scikit-learn API. I try to stick to scikit-learn standards and be "compatible" as much as possible, but the estimator checks will not pass due to my classifier currently only supporting binary classification as a first-pass MVP. Due to these requirements I currently have thousands of lines of re-written test code in my project that I'd love to lose. I don't think that multiclass should be a requirement to be a scikit-learn compatible estimator as an external package, though open to being challenged on that front.Changes to the test suite re-frame tests that have multiple classes to be binary if the "binary_only" flag exists within the
more_tags
attribute of the classifier.Any other comments?
Need to write a test to ensure future tests respect this flag, will remove WIP from title when ready. Happy to chat before then :-)