TST check multilabel common check for supported estimators #19859

glemaitre · 2021-04-10T21:05:57Z

Toward fixing #2451

Create a common test to check the output format of predict, predict_proba, and decision_function for classifiers supporting multilabel-indicator.

add the "multilabel" tag to classifiers that are supposed to support this format. Uses a mixin similarly to MultiOutputMixin.
create tests that will check the consistency of predict, predict_proba, and decision_function

glemaitre · 2021-04-13T20:18:18Z

I should check if we should add ClassifierChain and MultiOutputClassifier.

glemaitre · 2021-04-14T07:51:12Z

I should check if we should add ClassifierChain and MultiOutputClassifier.

After investigation, these classifiers would requires to activate the common test for them and will increase the size of this PR.
We can create a subsequent PR to address these two classifiers.

adrinjalali

I also wonder if we want to have a test making sure that the estimators you're changing (setting the estimator tag) are actually tested. (kinda making sure the tags themselves are correctly tested I guess?)

sklearn/ensemble/_forest.py

sklearn/utils/estimator_checks.py

adrinjalali · 2021-04-18T15:58:56Z

sklearn/utils/estimator_checks.py

@@ -2120,6 +2123,114 @@ def check_classifiers_multilabel_representation_invariance(
    assert type(y_pred) == type(y_pred_list_of_lists)


+@ignore_warnings(category=FutureWarning)
+def check_classifiers_multilabel_format_output(name, classifier_orig):


do we have a test to check that predict and argmax(predict_proba) are consistent?

Not in this PR but it should be another additional check. I thought to have program it in the past (and it fails :)) but I cannot find any PR.

sklearn/utils/tests/test_estimator_checks.py

sklearn/linear_model/_ridge.py

sklearn/utils/tests/test_estimator_checks.py

sklearn/utils/estimator_checks.py

jjerphan

I left some comments as a newcomer to this aspect of the code-base.

adrinjalali · 2021-04-21T12:21:36Z

Thanks for you reviews @jjerphan . Just a note that if you "Start a review", and then leave all your comments and then "submit your review" at the end, we'll get a single email notification with all of them instead of a single email for each comment you leave here. Generally I'd recommend avoiding "leave a single comment" as much as you can :)

jjerphan · 2021-04-21T12:26:16Z

Thanks for highlighting it, @adrinjalali. I felt like I was initially going to submit only one comment.

I do agree, can relate for this inconvenience and will definitely submit comments in batches in the future.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre · 2021-07-29T20:35:15Z

Uhm it seems that I broke all CIs?

glemaitre · 2021-07-30T12:01:45Z

@jjerphan Actually I split the test into three small tests that are maybe easier to read.

glemaitre · 2021-07-30T12:04:31Z

@jjerphan @adrinjalali I think this is good to be reviewed again.

jjerphan

A few last comments and suggestions.

sklearn/linear_model/_ridge.py

sklearn/neighbors/_classification.py

sklearn/tree/_classes.py

sklearn/utils/estimator_checks.py

sklearn/utils/tests/test_estimator_checks.py

sklearn/utils/estimator_checks.py

sklearn/utils/tests/test_estimator_checks.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jjerphan

Thanks @glemaitre, this LGTM!

adrinjalali

I haven't checked if all the ones which support multilable are actually included in this PR, but otherwise LGTM.

sklearn/utils/estimator_checks.py

adrinjalali · 2021-08-03T11:45:12Z

sklearn/utils/tests/test_estimator_checks.py

+from sklearn.utils.estimator_checks import (
+    _NotAnArray,
+    _set_checking_parameters,
+    check_class_weight_balanced_linear_classifier,
+    check_classifier_data_not_an_array,
+    check_classifiers_multilabel_output_format_decision_function,
+    check_classifiers_multilabel_output_format_predict,
+    check_classifiers_multilabel_output_format_predict_proba,
+    check_estimator,
+    check_estimator_get_tags_default_keys,
+    check_estimators_unfitted,
+    check_fit_score_takes_y,
+    check_no_attributes_set_in_init,
+    check_regressor_data_not_an_array,
+    check_outlier_corruption,
+    set_random_state,
+)


am I the only one who prefers a single line per import kinda style instead of this? 😁

The main pro of this convention is that it groups the symbols of a module or a submodule in one place, though there's no (automatic) hard checks on whether there are other imports from the same module or submodule somewhere else.

I prefer this style, but I agree that this adds yet another style in the codebase, which is something we might want to avoid. I don't have a strong opinion. 🙂

We could also just try to have the imports from the same module one after the other, which is what we try to do in other places anyway :D

I am not preferring anything-anymore with black :)

glemaitre · 2021-08-05T13:48:00Z

@rth do you want to merge this one. It only adds tests :P

rth

Overall the code LGTM, but I'm not sure about the creation of a MultiLabelMixin. The whole point of tags was not to rely on class inheritance for feature detection, and adding this mixin adds somewhat redundant information between the mixin and the tag. Can't we only the update the tags for the estimators in question?

glemaitre · 2021-08-05T14:22:34Z

This is indeed a philosophical question 😎

I think that I agree with you since we try to rely more and more on tags. Mixin avoids having to add manually the tag each time at the cost of adding the mixin itself to the MRO.

Your proposal is thus as costly regarding the maintenance and certainly more explicit when reading the class declaration.
Let me do the check and check that everything works as expected.

glemaitre · 2021-08-05T14:34:50Z

@rth here you go.

rth

Thanks, LGTM. Linting failed however..

…arn#19859) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre added 3 commits April 10, 2021 23:01

TST check multilabel common check for supported estimators

a1633f9

iter

bc8a96f

iter

d366492

glemaitre mentioned this pull request Apr 12, 2021

RidgeClassifierCV is not supporting properly multilabel-indicator #19858

Closed

glemaitre added 5 commits April 12, 2021 13:58

iter

5ec3282

iter

157bb2c

iter

520911c

TST add test

5881ef8

iter

c59e7eb

glemaitre added No Changelog Needed module:test-suite everything related to our tests labels Apr 12, 2021

iter

7e9f70b

glemaitre mentioned this pull request Apr 12, 2021

FIX add support for multilabel classification in RidgeClassifier* #19869

Merged

glemaitre added 2 commits April 13, 2021 00:09

iter

c37e68f

PEP8

079caad

glemaitre mentioned this pull request Apr 13, 2021

RFC shape of output of predict_proba for multilabel-indicator #19880

Open

adrinjalali reviewed Apr 18, 2021

View reviewed changes

improve error messages

8c96692