-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Prototype 4 for strict check_estimator mode #17361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Prototype 4 for strict check_estimator mode #17361
Conversation
…rict_mode_using_xfails_tag
…rict_mode_using_xfails_tag
…rict_mode_using_xfails_tag
…rict_mode_xfails_partially_strict_checks
…rict_mode_xfails_partially_strict_checks
@@ -2858,17 +2934,20 @@ def check_outliers_fit_predict(name, estimator_orig): | |||
assert_raises(ValueError, estimator.fit_predict, X) | |||
|
|||
|
|||
def check_fit_non_negative(name, estimator_orig): | |||
def check_fit_non_negative(name, estimator_orig, strict_mode=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only "partially strict check" that I implemented for now. I want to make the PR about the design rather than about the specific details of each checks.
@@ -2991,3 +3070,9 @@ def check_requires_y_none(name, estimator_orig): | |||
except ValueError as ve: | |||
if not any(msg in str(ve) for msg in expected_err_msgs): | |||
warnings.warn(warning_msg, FutureWarning) | |||
|
|||
|
|||
# set of checks that are completely strict, i.e. they have no non-strict part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is the only fully strict check.
Ready for reviews @jnothman @rth @thomasjpfan @adrinjalali @amueller Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to submit this review.
Having strict_mode
everywhere seems to be the only way to have fine grain control.
) | ||
|
||
return wrapped | ||
|
||
|
||
def parametrize_with_checks(estimators): | ||
def _should_be_skipped_or_marked(estimator, check, strict_mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very go-lang like.
I am +0.5 on returning an empty string for the False case and using if not reason
as the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is a tiny bit redundant, but I can't say I like the use of if not reason:
. This is how it was before and this made it hard for me to grasp how everything was tying in together at the time.
Let's see if others have comments on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also go "full python" and raise an exception.
sklearn/utils/estimator_checks.py
Outdated
return True, xfail_checks[check_name] | ||
|
||
if check_name in _FULLY_STRICT_CHECKS and not strict_mode: | ||
return True, 'The check is fully strict and strict mode is off' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the check name here?
return True, 'The check is fully strict and strict mode is off' | |
return True, f'The {check_name} is fully strict and strict mode is off' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide to go with strict_mode
instead of strict
?
Also instead of _FULLY_STRICT_CHECKS
dict how about strict_mode='full'
by default and just parsing that with inspect.signature(check_..)['strict_mode'].default
?
check_name = (check.func.__name__ if isinstance(check, partial) | ||
else check.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_name = (check.func.__name__ if isinstance(check, partial) | |
else check.__name__) | |
check_name = _get_check_estimator_ids(check) |
should work, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes though I think this is simpler and more explicit. I'd be more comfortable with using _get_check_estimator_ids
if it were called _get_repr
or something like that, but I feel like the id
term is more confusing that anything in this context.
After all, all we want is the function name and this is (almost) a one-liner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about _get_check_estimator_repr
? Simpler currently I don't know. This reminds me of the case where we could have classes or instances of estimator, and always had to keep that in mind.
That function hides that check could be a callable or a partial object, which I think is not relevant for to understand the _maybe_skip
or the _should_be_skipped_or_marked
object and therefore simplify the reading a bit.
check_name = (check.func.__name__ if isinstance(check, partial) | ||
else check.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_name = (check.func.__name__ if isinstance(check, partial) | |
else check.__name__) | |
check_name = _get_check_estimator_ids(check) |
Nope, happy to change if needed
Sorry I'm not sure I understand: are you suggesting to have |
I mean currently I find a bit confusing that technically a check could have In if check_name in _FULLY_STRICT_CHECKS and not strict_mode: by if inspect.signature(check)['strict_mode'].default == 'full' and not strict_mode: and get rid of |
oh OK I see. I think that'd be fine by me. I agree that it's a bit weird that all checks have the parameter even though some of them don't use it. One perk of the dict is that we can check out all the fully strict checks in one single place. |
When we merge this, it might be also nice to add the the author from #14929 as a co-author, as he proposed very similar changes in that PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, except that some documentation should be updated.
What docs @jnothman ? Regarding the what'snew entry, I think it will be easier to write once we actually settle on which checks we want to make strict / partially strict. |
Let's merge now. I agree that it would be good to add a note on backward compatibility of Thanks for your work on this! |
Thanks everyone for the reviews and feedback |
I meant developer docs where we talk about check_estimator and/or tags
|
This is also missing from what's new. I'm not really sure what the hurry was for merge without developer docs etc. It really needs to be in https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator. |
Don't worry, I'm not forgetting the docs. As suggested in my message above I think they will be easier to write once we have a clearer idea of what checks are actually strict. |
Right now this PR doesn't do much as far as users are concerned. This is mostly an internal refactoring in the common test mechanism. Yes, we can document
It's been 3 month since prototype 1 of this PR and we have been discussing this ever since. If the technical implementation is sound, I see no downsides of merging it so we can actually start using this in various common checks. We can't do that until this PR is merged. |
Okay. I just thought it unusual to merge a new feature workout docs. Thanks
for explaining that you're waiting on more checks to be be designated as
strict
|
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
Closes #14929
Fixes #13969
Fixes #16241
Alternative to #17252, #16882, and #16890
Same as #17252 but supports partially strict checks:
xfail_checks
tag when strict mode is offAlso note that:
strict_mode
parameter now, even if they don't use it at all. I'm not sure we can get around that?xfail_checks
tag does not support skipping a check based on one of its arguement: i.e. it is not possible to specify "this check is expected to fail when strict mode is True but it should pass when strict mode is off". This is already the case in master, not a limitation of this PR. If we want to support this use-case, we will need to make thexfail_checks
API more complex: I'd suggest leaving this for another set of PRs. This is pretty much orthogonal to the proposed changes here.Edit: closes #17252, closes #16882, closes #16890