-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST check multilabel common check for supported estimators #19859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I should check if we should add |
After investigation, these classifiers would requires to activate the common test for them and will increase the size of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if we want to have a test making sure that the estimators you're changing (setting the estimator tag) are actually tested. (kinda making sure the tags themselves are correctly tested I guess?)
sklearn/utils/estimator_checks.py
Outdated
@@ -2120,6 +2123,114 @@ def check_classifiers_multilabel_representation_invariance( | |||
assert type(y_pred) == type(y_pred_list_of_lists) | |||
|
|||
|
|||
@ignore_warnings(category=FutureWarning) | |||
def check_classifiers_multilabel_format_output(name, classifier_orig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test to check that predict
and argmax(predict_proba)
are consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR but it should be another additional check. I thought to have program it in the past (and it fails :)) but I cannot find any PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments as a newcomer to this aspect of the code-base.
Thanks for you reviews @jjerphan . Just a note that if you "Start a review", and then leave all your comments and then "submit your review" at the end, we'll get a single email notification with all of them instead of a single email for each comment you leave here. Generally I'd recommend avoiding "leave a single comment" as much as you can :) |
Thanks for highlighting it, @adrinjalali. I felt like I was initially going to submit only one comment. I do agree, can relate for this inconvenience and will definitely submit comments in batches in the future. |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Uhm it seems that I broke all CIs? |
@jjerphan Actually I split the test into three small tests that are maybe easier to read. |
@jjerphan @adrinjalali I think this is good to be reviewed again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last comments and suggestions.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @glemaitre, this LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked if all the ones which support multilable are actually included in this PR, but otherwise LGTM.
from sklearn.utils.estimator_checks import ( | ||
_NotAnArray, | ||
_set_checking_parameters, | ||
check_class_weight_balanced_linear_classifier, | ||
check_classifier_data_not_an_array, | ||
check_classifiers_multilabel_output_format_decision_function, | ||
check_classifiers_multilabel_output_format_predict, | ||
check_classifiers_multilabel_output_format_predict_proba, | ||
check_estimator, | ||
check_estimator_get_tags_default_keys, | ||
check_estimators_unfitted, | ||
check_fit_score_takes_y, | ||
check_no_attributes_set_in_init, | ||
check_regressor_data_not_an_array, | ||
check_outlier_corruption, | ||
set_random_state, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I the only one who prefers a single line per import kinda style instead of this? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main pro of this convention is that it groups the symbols of a module or a submodule in one place, though there's no (automatic) hard checks on whether there are other imports from the same module or submodule somewhere else.
I prefer this style, but I agree that this adds yet another style in the codebase, which is something we might want to avoid. I don't have a strong opinion. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just try to have the imports from the same module one after the other, which is what we try to do in other places anyway :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not preferring anything-anymore with black
:)
@rth do you want to merge this one. It only adds tests :P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the code LGTM, but I'm not sure about the creation of a MultiLabelMixin
. The whole point of tags was not to rely on class inheritance for feature detection, and adding this mixin adds somewhat redundant information between the mixin and the tag. Can't we only the update the tags for the estimators in question?
This is indeed a philosophical question 😎 I think that I agree with you since we try to rely more and more on tags. Mixin avoids having to add manually the tag each time at the cost of adding the mixin itself to the MRO. Your proposal is thus as costly regarding the maintenance and certainly more explicit when reading the class declaration. |
@rth here you go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM. Linting failed however..
…arn#19859) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Toward fixing #2451
Create a common test to check the output format of
predict
,predict_proba
, anddecision_function
for classifiers supporting multilabel-indicator.MultiOutputMixin
.predict
,predict_proba
, anddecision_function