[MRG] Prototype 3 for strict check_estimator mode #17252

NicolasHug · 2020-05-16T14:52:24Z

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #16882 and #16890

This version treats strict checks as if they were in the _xfail_checks tag of the estimator.

If we want to, I think this can be extended reasonably easily so that checks have a strict part and a non-strict part.

NicolasHug · 2020-05-16T15:21:46Z

Third time's the charm, right?

This is my preferred version so far.

CC @jnothman @rth @thomasjpfan @adrinjalali

adrinjalali

This looks very neat to me! Thanks @NicolasHug

adrinjalali · 2020-05-19T13:01:37Z

sklearn/tests/test_common.py

+
+@parametrize_with_checks([LogisticRegression(), NuSVC()], strict_mode=False)
+def test_strict_mode_parametrize_with_checks(estimator, check):
+    # Not sure how to test parametrize_with_checks correctly??


would it be possible to have a dummy estimator to check the strict_mode={False, True}? We probably shouldn't rely on the existing estimators here, we may end up fixing them :P

These dummy estimators that we'd create would still need to depend on some of our builtin estimators, otherwise the checks won't pass. So wouldn't that just shift the issue?

…rict_mode_using_xfails_tag

rth

This version treats strict checks as if they were in the _xfail_checks tag of the estimator.

I think it's a good idea! Just one comment otherwise looks good,

rth · 2020-05-20T07:21:29Z

sklearn/utils/estimator_checks.py

+        strict_checks = {
+            _set_check_estimator_ids(check):
+            'The check is strict and strict mode is off'  # the reason
+            for check in _STRICT_CHECKS


I think it might be earlier to maintain if it was a decorator on the checks rather than a global list of functions. Didn't we try that in one of the N earlier versions or were there arguments against it ? :)

Edit: although maybe it would make the traceback harder to read?

Didn't we try that in one of the N earlier versions

One version required a global var, and another one required to pass a strict param to every single check, so they're less ideal.

I guess we could have a @is_strict decorator that would set an attribute on the check, but it seems a bit overkill?

OK, let's start with the current approach, we can always add decorators later.

jnothman

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

rth

LGTM, thanks @NicolasHug !

+1 for https://github.com/scikit-learn/scikit-learn/pull/17252/files#r427283093 to maybe creating dummy estimators for tests instead of relying on the behavior of existing estimators.

rth · 2020-05-20T12:28:31Z

sklearn/utils/estimator_checks.py

+
+    if not strict_mode:
+        strict_checks = {
+            _set_check_estimator_ids(check):


This one is a bit unfortunate. I mean we already have the raw function objects, but I guess not re-using this mechanism means we have to handle partials functions. So it's probably OK for now.

Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict? that would be a dict of strings instead of a dict of functions

I'm only using the same logic as for the _xfail_checks tag (which I find... a bit sloppy, I'll give you that ;) )

Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict?

No strong feeling about it. It's more a side comment, I'm OK with the proposed solution as well.

rth · 2020-05-20T12:33:51Z

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

It's true that we could also set _strict_commmon_checks = False (or something similar) in the estimator tags. The issue is that as far as I understand, contrib projects should always run with strict=False, meaning that strict=True is for our internal use only. In which case it would be a shame to require defining this tag in each contrib estimator.

…rict_mode_using_xfails_tag

NicolasHug · 2020-05-20T20:32:51Z

Thanks for the reviews and comments, I've cleaned it up and this is now MRG!

adrinjalali · 2020-05-21T09:50:49Z

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

From the perspective of a third party developer, I find it more intuitive to develop an estimator which follows the API, but doesn't necessarily pass all the tests, and only test for the ones that I think are relevant, like API checks.

This doesn't mean estimator tags are irrelevant, they're still pretty nice to inspect them and make decisions based on it in a meta-estimator for instance, but that may not apply to all the tags we have right now.

I think a nice path forward would be to group tests into "api", "sanity", "input validation", "sample weight", etc., and have a check_estimator with a signature like:

def check_esimator(..., check_api=True, check_sanity=True, ...):
   ...

Then, in our common tests, we can have something like:

CHECK_ESTIMATOR_PARAMS = {
    'SVC': {check_sample_weight=False, ...},
    ...
}

And use it when calling check_estimator for our estimators.

amueller · 2020-05-25T12:49:46Z

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

I agree with @adrinjalali that it would be good to have as parameter (and apparently that's what we just agreed on in the meeting).

I think it's quite important to allow strict and non-strict parts in a single test, say for the error messages.

treat strict checks as xfail checks

8775c1e

This was referenced May 16, 2020

[WIP] Prototype 1 for strict check_estimator mode #16882

Closed

[WIP] Prototype 2 for strict check_estimator mode #16890

Closed

github-actions bot added the module:utils label May 16, 2020

adrinjalali reviewed May 19, 2020

View reviewed changes

NicolasHug added 3 commits May 19, 2020 09:34

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

8c07ca6

…rict_mode_using_xfails_tag

different names

664552f

Comments

ecff04c

rth reviewed May 20, 2020

View reviewed changes

jnothman reviewed May 20, 2020

View reviewed changes

rth approved these changes May 20, 2020

View reviewed changes

NicolasHug added 2 commits May 20, 2020 09:01

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

8e66d47

…rict_mode_using_xfails_tag

some clearning

c7c5c8d

NicolasHug changed the title ~~[WIP] Prototype 3 for strict check_estimator mode~~ [MRG] strict moded for check suite, using _xfail_tag May 20, 2020

NicolasHug changed the title ~~[MRG] strict moded for check suite, using _xfail_tag~~ [MRG] strict mode for check suite, using _xfail_tag May 20, 2020

NicolasHug changed the title ~~[MRG] strict mode for check suite, using _xfail_tag~~ [MRG] Prototype 3 for strict check_estimator mode May 26, 2020

NicolasHug mentioned this pull request May 26, 2020

[MRG] Prototype 4 for strict check_estimator mode #17361

Merged

rth closed this in #17361 Aug 6, 2020

adrinjalali mentioned this pull request Aug 21, 2024

TST allow categorisation of tests into API and legacy #29699

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Prototype 3 for strict check_estimator mode #17252

[MRG] Prototype 3 for strict check_estimator mode #17252

NicolasHug commented May 16, 2020 •

edited

Loading

NicolasHug commented May 16, 2020

adrinjalali left a comment

adrinjalali May 19, 2020

NicolasHug May 20, 2020

rth left a comment

rth May 20, 2020 •

edited

Loading

NicolasHug May 20, 2020

rth May 20, 2020

jnothman left a comment

rth left a comment

rth May 20, 2020

NicolasHug May 20, 2020

rth May 20, 2020

rth commented May 20, 2020

NicolasHug commented May 20, 2020

adrinjalali commented May 21, 2020

amueller commented May 25, 2020

[MRG] Prototype 3 for strict check_estimator mode #17252

[MRG] Prototype 3 for strict check_estimator mode #17252

Conversation

NicolasHug commented May 16, 2020 • edited Loading

NicolasHug commented May 16, 2020

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali May 19, 2020

Choose a reason for hiding this comment

NicolasHug May 20, 2020

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

rth May 20, 2020 • edited Loading

Choose a reason for hiding this comment

NicolasHug May 20, 2020

Choose a reason for hiding this comment

rth May 20, 2020

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

rth May 20, 2020

Choose a reason for hiding this comment

NicolasHug May 20, 2020

Choose a reason for hiding this comment

rth May 20, 2020

Choose a reason for hiding this comment

rth commented May 20, 2020

NicolasHug commented May 20, 2020

adrinjalali commented May 21, 2020

amueller commented May 25, 2020

NicolasHug commented May 16, 2020 •

edited

Loading

rth May 20, 2020 •

edited

Loading