Skip to content

[NoMRG] evaluate minimal implementation for sklearn estimator #18811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 30 commits into from

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Nov 11, 2020

Build on the top of:

This PR is to evaluate what is the minimal implementation required for a compatible estimator (Regressor/Classifier/Transformer) which would not inherit from BaseEstimator.

The relevant changes are in test_estimator_checks.py in the function test_check_estimator_minimal

@glemaitre glemaitre marked this pull request as draft November 11, 2020 12:57
@glemaitre glemaitre changed the title Safe tags api only [NoMRG] evaluate minimal implementation for sklearn estimator Nov 11, 2020
@ogrisel
Copy link
Member

ogrisel commented Nov 13, 2020

Have you ever seen this kind of pytest error before?

==================================== ERRORS ====================================
_____________________________ ERROR collecting gw1 _____________________________
Different tests were collected between gw0 and gw1. The difference is:
--- gw0

+++ gw1

@@ -18216,102 +18216,102 @@

 utils/tests/test_estimator_checks.py::test_check_estimators_unfitted
 utils/tests/test_estimator_checks.py::test_check_no_attributes_set_in_init
 utils/tests/test_estimator_checks.py::test_check_estimator_pairwise
-utils/tests/test_estimator_checks.py::test_check_estimator_minimal[<sklearn.utils.tests.test_estimator_checks.MinimalClassifierobjectat0x7f42e607cb80>-check_no_attributes_set_in_init(api_only=True)]
...
+utils/tests/test_estimator_checks.py::test_check_estimator_minimal[<sklearn.utils.tests.test_estimator_checks.MinimalTransformerobjectat0x7f9612b0b2b0>-check_n_features_in(api_only=True)]
+utils/tests/test_estimator_checks.py::test_check_estimator_minimal[<sklearn.utils.tests.test_estimator_checks.MinimalTransformerobjectat0x7f9612b0b2b0>-check_fit1d(api_only=True)]
+utils/tests/test_estimator_checks.py::test_check_estimator_minimal[<sklearn.utils.tests.test_estimator_checks.MinimalTransformerobjectat0x7f9612b0b2b0>-check_fit2d_predict1d(api_only=True)]
 utils/tests/test_estimator_checks.py::test_check_classifier_data_not_an_array
 utils/tests/test_estimator_checks.py::test_check_regressor_data_not_an_array
 utils/tests/test_estimator_checks.py::test_check_class_weight_balanced_linear_classifier
--------- generated xml file: /home/vsts/work/tmp_folder/test-data.xml ---------

@glemaitre
Copy link
Member Author

Have you ever seen this kind of pytest error before?

noop, I don't know if this is linked with running a test calling run_tests_without_pytest while I am using parametrize_with_checks which required pytest. Depending on the order running the test, things can go sideways.
It is passing locally.

@thomasjpfan
Copy link
Member

Have you ever seen this kind of pytest error before?

That error appears when the id of tests are not the same across processes when using pytest-xdist. For this specific case, we would need to give MinimalClassifierobject an id does not include the memory location such as 7f42e607cb80.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new test for the minimal estimators. I think we should add an estimator checks that checks that if a model has a predict method, then either is_classifier or is_regressor should return True both not both at the same time and not neither of them.

Comment on lines +637 to +641
def __getstate__(self):
return self.__dict__.copy()

def __setstate__(self, state):
self.__dict__.update(state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just remove those methods, no?

@@ -2251,8 +2331,7 @@ def check_classifiers_predictions(X, y, name, classifier_orig,
(classifier, ", ".join(map(str, y_exp)),
", ".join(map(str, y_pred))))

# training set performance
if name != "ComplementNB":
if not api_only and name != "ComplementNB":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug we are missing this check in the api_only PR. However, there is something fishy here because either I am passing api_only=True, this if statement is run. I will investigate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because api_only isn't passed in the calls to check_classifiers_predictions. I'll update the other PR, thanks for noting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not pass api_only in the other when calling this function in check_classifiers_classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was 16 seconds too slow :)

@glemaitre
Copy link
Member Author

The only issue here is that my test should be placed in test_estimator_checks.py because it requires pytest

@ogrisel
Copy link
Member

ogrisel commented Nov 23, 2020

Yes we could move test_check_estimator_minimal in test_common although this is weird because this test is more written to check the checks and not the minimal estimators themselves.

@amueller
Copy link
Member

for the record I thought we already had this, but maybe I was hallucinating. I think we certainly need it. Maybe having at least one constructor argument would be good?

@glemaitre
Copy link
Member Author

I am closing this PR but I will introduce the test with minimal estimator implementation.
We will have to skip it for this release because it will fail for the minimal performance checks.

for the record I thought we already had this, but maybe I was hallucinating. I think we certainly need it. Maybe having at least one constructor argument would be good?

I think this is a good point. It would even be more meaningful to test set_params and get_params checks and we could even check the compatibility with SearchCV and Pipeline.

@glemaitre glemaitre closed this Nov 25, 2020
avm19 pushed a commit to avm19/scikit-learn that referenced this pull request Jan 7, 2023
Tests in CI pipeline return error:
Different tests were collected between gw0 and gw1
For details and similar situation see:scikit-learn#18811 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants