Skip to content

[MRG] Prototype 3 for strict check_estimator mode #17252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 16, 2020

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #16882 and #16890

This version treats strict checks as if they were in the _xfail_checks tag of the estimator.

If we want to, I think this can be extended reasonably easily so that checks have a strict part and a non-strict part.

@NicolasHug
Copy link
Member Author

Third time's the charm, right?

This is my preferred version so far.

CC @jnothman @rth @thomasjpfan @adrinjalali

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very neat to me! Thanks @NicolasHug


@parametrize_with_checks([LogisticRegression(), NuSVC()], strict_mode=False)
def test_strict_mode_parametrize_with_checks(estimator, check):
# Not sure how to test parametrize_with_checks correctly??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to have a dummy estimator to check the strict_mode={False, True}? We probably shouldn't rely on the existing estimators here, we may end up fixing them :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dummy estimators that we'd create would still need to depend on some of our builtin estimators, otherwise the checks won't pass. So wouldn't that just shift the issue?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version treats strict checks as if they were in the _xfail_checks tag of the estimator.

I think it's a good idea! Just one comment otherwise looks good,

strict_checks = {
_set_check_estimator_ids(check):
'The check is strict and strict mode is off' # the reason
for check in _STRICT_CHECKS
Copy link
Member

@rth rth May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be earlier to maintain if it was a decorator on the checks rather than a global list of functions. Didn't we try that in one of the N earlier versions or were there arguments against it ? :)

Edit: although maybe it would make the traceback harder to read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we try that in one of the N earlier versions

One version required a global var, and another one required to pass a strict param to every single check, so they're less ideal.

I guess we could have a @is_strict decorator that would set an attribute on the check, but it seems a bit overkill?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's start with the current approach, we can always add decorators later.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @NicolasHug !

+1 for https://github.com/scikit-learn/scikit-learn/pull/17252/files#r427283093 to maybe creating dummy estimators for tests instead of relying on the behavior of existing estimators.


if not strict_mode:
strict_checks = {
_set_check_estimator_ids(check):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit unfortunate. I mean we already have the raw function objects, but I guess not re-using this mechanism means we have to handle partials functions. So it's probably OK for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict? that would be a dict of strings instead of a dict of functions

I'm only using the same logic as for the _xfail_checks tag (which I find... a bit sloppy, I'll give you that ;) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict?

No strong feeling about it. It's more a side comment, I'm OK with the proposed solution as well.

@rth
Copy link
Member

rth commented May 20, 2020

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

It's true that we could also set _strict_commmon_checks = False (or something similar) in the estimator tags. The issue is that as far as I understand, contrib projects should always run with strict=False, meaning that strict=True is for our internal use only. In which case it would be a shame to require defining this tag in each contrib estimator.

@NicolasHug NicolasHug changed the title [WIP] Prototype 3 for strict check_estimator mode [MRG] strict moded for check suite, using _xfail_tag May 20, 2020
@NicolasHug NicolasHug changed the title [MRG] strict moded for check suite, using _xfail_tag [MRG] strict mode for check suite, using _xfail_tag May 20, 2020
@NicolasHug
Copy link
Member Author

Thanks for the reviews and comments, I've cleaned it up and this is now MRG!

@adrinjalali
Copy link
Member

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

From the perspective of a third party developer, I find it more intuitive to develop an estimator which follows the API, but doesn't necessarily pass all the tests, and only test for the ones that I think are relevant, like API checks.

This doesn't mean estimator tags are irrelevant, they're still pretty nice to inspect them and make decisions based on it in a meta-estimator for instance, but that may not apply to all the tags we have right now.

I think a nice path forward would be to group tests into "api", "sanity", "input validation", "sample weight", etc., and have a check_estimator with a signature like:

def check_esimator(..., check_api=True, check_sanity=True, ...):
   ...

Then, in our common tests, we can have something like:

CHECK_ESTIMATOR_PARAMS = {
    'SVC': {check_sample_weight=False, ...},
    ...
}

And use it when calling check_estimator for our estimators.

@amueller
Copy link
Member

Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.

I agree with @adrinjalali that it would be good to have as parameter (and apparently that's what we just agreed on in the meeting).

I think it's quite important to allow strict and non-strict parts in a single test, say for the error messages.

@NicolasHug NicolasHug changed the title [MRG] strict mode for check suite, using _xfail_tag [MRG] Prototype 3 for strict check_estimator mode May 26, 2020
@rth rth closed this in #17361 Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants