Skip to content

[MRG] Prototype 4 for strict check_estimator mode #17361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 26, 2020

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #17252, #16882, and #16890

Same as #17252 but supports partially strict checks:

  • fully strict checks are treated as if they were in the xfail_checks tag when strict mode is off
  • partially strict checks are always run.

Also note that:

  • all checks need to have a strict_mode parameter now, even if they don't use it at all. I'm not sure we can get around that?
  • the xfail_checks tag does not support skipping a check based on one of its arguement: i.e. it is not possible to specify "this check is expected to fail when strict mode is True but it should pass when strict mode is off". This is already the case in master, not a limitation of this PR. If we want to support this use-case, we will need to make the xfail_checks API more complex: I'd suggest leaving this for another set of PRs. This is pretty much orthogonal to the proposed changes here.

Edit: closes #17252, closes #16882, closes #16890

@@ -2858,17 +2934,20 @@ def check_outliers_fit_predict(name, estimator_orig):
assert_raises(ValueError, estimator.fit_predict, X)


def check_fit_non_negative(name, estimator_orig):
def check_fit_non_negative(name, estimator_orig, strict_mode=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only "partially strict check" that I implemented for now. I want to make the PR about the design rather than about the specific details of each checks.

@@ -2991,3 +3070,9 @@ def check_requires_y_none(name, estimator_orig):
except ValueError as ve:
if not any(msg in str(ve) for msg in expected_err_msgs):
warnings.warn(warning_msg, FutureWarning)


# set of checks that are completely strict, i.e. they have no non-strict part
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this is the only fully strict check.

@NicolasHug NicolasHug changed the title [WIP] Prototype 4 for strict check_estimator mode [MRG] Prototype 4 for strict check_estimator mode May 27, 2020
@NicolasHug
Copy link
Member Author

Ready for reviews @jnothman @rth @thomasjpfan @adrinjalali @amueller

Thanks!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to submit this review.

Having strict_mode everywhere seems to be the only way to have fine grain control.

)

return wrapped


def parametrize_with_checks(estimators):
def _should_be_skipped_or_marked(estimator, check, strict_mode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very go-lang like.

I am +0.5 on returning an empty string for the False case and using if not reason as the check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is a tiny bit redundant, but I can't say I like the use of if not reason:. This is how it was before and this made it hard for me to grasp how everything was tying in together at the time.

Let's see if others have comments on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also go "full python" and raise an exception.

return True, xfail_checks[check_name]

if check_name in _FULLY_STRICT_CHECKS and not strict_mode:
return True, 'The check is fully strict and strict mode is off'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the check name here?

Suggested change
return True, 'The check is fully strict and strict mode is off'
return True, f'The {check_name} is fully strict and strict mode is off'

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we decide to go with strict_mode instead of strict?

Also instead of _FULLY_STRICT_CHECKS dict how about strict_mode='full' by default and just parsing that with inspect.signature(check_..)['strict_mode'].default?

Comment on lines +367 to +368
check_name = (check.func.__name__ if isinstance(check, partial)
else check.__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
check_name = (check.func.__name__ if isinstance(check, partial)
else check.__name__)
check_name = _get_check_estimator_ids(check)

should work, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes though I think this is simpler and more explicit. I'd be more comfortable with using _get_check_estimator_ids if it were called _get_repr or something like that, but I feel like the id term is more confusing that anything in this context.

After all, all we want is the function name and this is (almost) a one-liner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about _get_check_estimator_repr? Simpler currently I don't know. This reminds me of the case where we could have classes or instances of estimator, and always had to keep that in mind.

That function hides that check could be a callable or a partial object, which I think is not relevant for to understand the _maybe_skip or the _should_be_skipped_or_marked object and therefore simplify the reading a bit.

Comment on lines +390 to +391
check_name = (check.func.__name__ if isinstance(check, partial)
else check.__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
check_name = (check.func.__name__ if isinstance(check, partial)
else check.__name__)
check_name = _get_check_estimator_ids(check)

@NicolasHug
Copy link
Member Author

Did we decide to go with strict_mode instead of strict?

Nope, happy to change if needed

Also instead of FULLY_STRICT_CHECKS dict how about strict_mode='full' by default and just parsing that with inspect.signature(check..)['strict_mode'].default?

Sorry I'm not sure I understand: are you suggesting to have strict_mode='full' as the default of check_estimator or the default of all checks? Where would we be using inspect.signature(check_..)['strict_mode'].default?

@rth
Copy link
Member

rth commented Jun 4, 2020

Sorry I'm not sure I understand: are you suggesting to have strict_mode='full' as the default of check_estimator or the default of all checks?

I mean currently I find a bit confusing that technically a check could have strict_mode=False as the default, and yet be in the FULLY_STRICT_CHECKS dict. So we could set all to True and the few fully strict ones to "full". The default doesn't matter as far as I understand, but at least looking at a check one would directly know what it is.

In _should_be_skipped_or_marked replace,

    if check_name in _FULLY_STRICT_CHECKS and not strict_mode:

by

    if inspect.signature(check)['strict_mode'].default == 'full' and not strict_mode:

and get rid of _FULLY_STRICT_CHECKS altogether. Not sure if it would be easier to understand, just wondering.

@NicolasHug
Copy link
Member Author

oh OK I see. I think that'd be fine by me. I agree that it's a bit weird that all checks have the parameter even though some of them don't use it. One perk of the dict is that we can check out all the fully strict checks in one single place.

@rth
Copy link
Member

rth commented Jul 11, 2020

When we merge this, it might be also nice to add the the author from #14929 as a co-author, as he proposed very similar changes in that PR.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, except that some documentation should be updated.

@NicolasHug
Copy link
Member Author

What docs @jnothman ?

Regarding the what'snew entry, I think it will be easier to write once we actually settle on which checks we want to make strict / partially strict.

@rth
Copy link
Member

rth commented Aug 6, 2020

Let's merge now. I agree that it would be good to add a note on backward compatibility of check_estimator but that applies even without this PR. We can do that in a follow up PR, and we would need to make one anyway to use this feature more in common tests. I'm just looking forward to closing those 5 linked PRs :)

Thanks for your work on this!

@rth rth merged commit 1dedc7e into scikit-learn:master Aug 6, 2020
@NicolasHug
Copy link
Member Author

Thanks everyone for the reviews and feedback

@jnothman
Copy link
Member

jnothman commented Aug 6, 2020 via email

@jnothman
Copy link
Member

jnothman commented Aug 6, 2020

This is also missing from what's new.

I'm not really sure what the hurry was for merge without developer docs etc. It really needs to be in https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.

@NicolasHug
Copy link
Member Author

Don't worry, I'm not forgetting the docs. As suggested in my message above I think they will be easier to write once we have a clearer idea of what checks are actually strict.

@rth
Copy link
Member

rth commented Aug 6, 2020

Right now this PR doesn't do much as far as users are concerned. This is mostly an internal refactoring in the common test mechanism. Yes, we can document strict=False in "rolling-your-own-estimator" but currently it has mostly no impact and this description will have to be adjusted in any case once we actually start implementing it in common checks.

I'm not really sure what the hurry was for merge without developer docs etc

It's been 3 month since prototype 1 of this PR and we have been discussing this ever since. If the technical implementation is sound, I see no downsides of merging it so we can actually start using this in various common checks. We can't do that until this PR is merged.

@jnothman
Copy link
Member

jnothman commented Aug 6, 2020 via email

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Oct 15, 2020
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 28, 2020
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Dec 12, 2020
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Mar 27, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 15, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jul 23, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request May 15, 2022
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 14, 2022
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 29, 2022
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Sep 5, 2022
see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants