-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Prototype 3 for strict check_estimator mode #17252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Prototype 3 for strict check_estimator mode #17252
Conversation
Third time's the charm, right? This is my preferred version so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very neat to me! Thanks @NicolasHug
sklearn/tests/test_common.py
Outdated
|
||
@parametrize_with_checks([LogisticRegression(), NuSVC()], strict_mode=False) | ||
def test_strict_mode_parametrize_with_checks(estimator, check): | ||
# Not sure how to test parametrize_with_checks correctly?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to have a dummy estimator to check the strict_mode={False, True}
? We probably shouldn't rely on the existing estimators here, we may end up fixing them :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These dummy estimators that we'd create would still need to depend on some of our builtin estimators, otherwise the checks won't pass. So wouldn't that just shift the issue?
…rict_mode_using_xfails_tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version treats strict checks as if they were in the _xfail_checks tag of the estimator.
I think it's a good idea! Just one comment otherwise looks good,
strict_checks = { | ||
_set_check_estimator_ids(check): | ||
'The check is strict and strict mode is off' # the reason | ||
for check in _STRICT_CHECKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be earlier to maintain if it was a decorator on the checks rather than a global list of functions. Didn't we try that in one of the N earlier versions or were there arguments against it ? :)
Edit: although maybe it would make the traceback harder to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we try that in one of the N earlier versions
One version required a global var, and another one required to pass a strict
param to every single check, so they're less ideal.
I guess we could have a @is_strict
decorator that would set an attribute on the check, but it seems a bit overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's start with the current approach, we can always add decorators later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator
, when estimator check conditions are generally defined by estimator tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @NicolasHug !
+1 for https://github.com/scikit-learn/scikit-learn/pull/17252/files#r427283093 to maybe creating dummy estimators for tests instead of relying on the behavior of existing estimators.
|
||
if not strict_mode: | ||
strict_checks = { | ||
_set_check_estimator_ids(check): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit unfortunate. I mean we already have the raw function objects, but I guess not re-using this mechanism means we have to handle partials functions. So it's probably OK for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can also directly use _set_check_estimator_ids(check)
in the _STRICT_CHECKS
dict? that would be a dict of strings instead of a dict of functions
I'm only using the same logic as for the _xfail_checks
tag (which I find... a bit sloppy, I'll give you that ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict?
No strong feeling about it. It's more a side comment, I'm OK with the proposed solution as well.
It's true that we could also set |
…rict_mode_using_xfails_tag
Thanks for the reviews and comments, I've cleaned it up and this is now MRG! |
From the perspective of a third party developer, I find it more intuitive to develop an estimator which follows the API, but doesn't necessarily pass all the tests, and only test for the ones that I think are relevant, like API checks. This doesn't mean estimator tags are irrelevant, they're still pretty nice to inspect them and make decisions based on it in a meta-estimator for instance, but that may not apply to all the tags we have right now. I think a nice path forward would be to group tests into "api", "sanity", "input validation", "sample weight", etc., and have a def check_esimator(..., check_api=True, check_sanity=True, ...):
... Then, in our common tests, we can have something like: CHECK_ESTIMATOR_PARAMS = {
'SVC': {check_sample_weight=False, ...},
...
} And use it when calling check_estimator for our estimators. |
I agree with @adrinjalali that it would be good to have as parameter (and apparently that's what we just agreed on in the meeting). I think it's quite important to allow strict and non-strict parts in a single test, say for the error messages. |
Closes #14929
Fixes #13969
Fixes #16241
Alternative to #16882 and #16890
This version treats strict checks as if they were in the
_xfail_checks
tag of the estimator.If we want to, I think this can be extended reasonably easily so that checks have a strict part and a non-strict part.