[MRG] Prototype 4 for strict check_estimator mode #17361

NicolasHug · 2020-05-26T23:24:08Z

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #17252, #16882, and #16890

Same as #17252 but supports partially strict checks:

fully strict checks are treated as if they were in the xfail_checks tag when strict mode is off
partially strict checks are always run.

Also note that:

all checks need to have a strict_mode parameter now, even if they don't use it at all. I'm not sure we can get around that?
the xfail_checks tag does not support skipping a check based on one of its arguement: i.e. it is not possible to specify "this check is expected to fail when strict mode is True but it should pass when strict mode is off". This is already the case in master, not a limitation of this PR. If we want to support this use-case, we will need to make the xfail_checks API more complex: I'd suggest leaving this for another set of PRs. This is pretty much orthogonal to the proposed changes here.

Edit: closes #17252, closes #16882, closes #16890

…rict_mode_using_xfails_tag

…rict_mode_xfails_partially_strict_checks

NicolasHug · 2020-05-27T12:46:22Z

sklearn/utils/estimator_checks.py

@@ -2858,17 +2934,20 @@ def check_outliers_fit_predict(name, estimator_orig):
            assert_raises(ValueError, estimator.fit_predict, X)


-def check_fit_non_negative(name, estimator_orig):
+def check_fit_non_negative(name, estimator_orig, strict_mode=True):


This is the only "partially strict check" that I implemented for now. I want to make the PR about the design rather than about the specific details of each checks.

NicolasHug · 2020-05-27T12:46:43Z

sklearn/utils/estimator_checks.py

@@ -2991,3 +3070,9 @@ def check_requires_y_none(name, estimator_orig):
    except ValueError as ve:
        if not any(msg in str(ve) for msg in expected_err_msgs):
            warnings.warn(warning_msg, FutureWarning)
+
+
+# set of checks that are completely strict, i.e. they have no non-strict part


Same here, this is the only fully strict check.

NicolasHug · 2020-05-27T13:01:04Z

Ready for reviews @jnothman @rth @thomasjpfan @adrinjalali @amueller

Thanks!

thomasjpfan

Forgot to submit this review.

Having strict_mode everywhere seems to be the only way to have fine grain control.

thomasjpfan · 2020-05-27T19:44:39Z

sklearn/utils/estimator_checks.py

        )

    return wrapped


-def parametrize_with_checks(estimators):
+def _should_be_skipped_or_marked(estimator, check, strict_mode):


This is very go-lang like.

I am +0.5 on returning an empty string for the False case and using if not reason as the check.

I understand this is a tiny bit redundant, but I can't say I like the use of if not reason:. This is how it was before and this made it hard for me to grasp how everything was tying in together at the time.

Let's see if others have comments on this?

We can also go "full python" and raise an exception.

thomasjpfan · 2020-06-01T22:35:54Z

sklearn/utils/estimator_checks.py

+        return True, xfail_checks[check_name]
+
+    if check_name in _FULLY_STRICT_CHECKS and not strict_mode:
+        return True, 'The check is fully strict and strict mode is off'


Include the check name here?

Suggested change

return True, 'The check is fully strict and strict mode is off'

return True, f'The {check_name} is fully strict and strict mode is off'

rth

Did we decide to go with strict_mode instead of strict?

Also instead of _FULLY_STRICT_CHECKS dict how about strict_mode='full' by default and just parsing that with inspect.signature(check_..)['strict_mode'].default?

rth · 2020-06-04T18:52:22Z

sklearn/utils/estimator_checks.py

+    check_name = (check.func.__name__ if isinstance(check, partial)
+                  else check.__name__)


Suggested change

check_name = (check.func.__name__ if isinstance(check, partial)

else check.__name__)

check_name = _get_check_estimator_ids(check)

should work, no?

Yes though I think this is simpler and more explicit. I'd be more comfortable with using _get_check_estimator_ids if it were called _get_repr or something like that, but I feel like the id term is more confusing that anything in this context.

After all, all we want is the function name and this is (almost) a one-liner

How about _get_check_estimator_repr? Simpler currently I don't know. This reminds me of the case where we could have classes or instances of estimator, and always had to keep that in mind.

That function hides that check could be a callable or a partial object, which I think is not relevant for to understand the _maybe_skip or the _should_be_skipped_or_marked object and therefore simplify the reading a bit.

rth · 2020-06-04T18:53:04Z

sklearn/utils/estimator_checks.py

+    check_name = (check.func.__name__ if isinstance(check, partial)
+                  else check.__name__)


Suggested change

check_name = (check.func.__name__ if isinstance(check, partial)

else check.__name__)

check_name = _get_check_estimator_ids(check)

NicolasHug · 2020-06-04T19:13:11Z

Did we decide to go with strict_mode instead of strict?

Nope, happy to change if needed

Also instead of FULLY_STRICT_CHECKS dict how about strict_mode='full' by default and just parsing that with inspect.signature(check..)['strict_mode'].default?

Sorry I'm not sure I understand: are you suggesting to have strict_mode='full' as the default of check_estimator or the default of all checks? Where would we be using inspect.signature(check_..)['strict_mode'].default?

rth · 2020-06-04T19:25:55Z

Sorry I'm not sure I understand: are you suggesting to have strict_mode='full' as the default of check_estimator or the default of all checks?

I mean currently I find a bit confusing that technically a check could have strict_mode=False as the default, and yet be in the FULLY_STRICT_CHECKS dict. So we could set all to True and the few fully strict ones to "full". The default doesn't matter as far as I understand, but at least looking at a check one would directly know what it is.

In _should_be_skipped_or_marked replace,

    if check_name in _FULLY_STRICT_CHECKS and not strict_mode:

by

    if inspect.signature(check)['strict_mode'].default == 'full' and not strict_mode:

and get rid of _FULLY_STRICT_CHECKS altogether. Not sure if it would be easier to understand, just wondering.

NicolasHug · 2020-06-04T19:53:21Z

oh OK I see. I think that'd be fine by me. I agree that it's a bit weird that all checks have the parameter even though some of them don't use it. One perk of the dict is that we can check out all the fully strict checks in one single place.

rth · 2020-07-11T19:21:42Z

When we merge this, it might be also nice to add the the author from #14929 as a co-author, as he proposed very similar changes in that PR.

jnothman

This looks good, except that some documentation should be updated.

NicolasHug · 2020-08-04T14:12:02Z

What docs @jnothman ?

Regarding the what'snew entry, I think it will be easier to write once we actually settle on which checks we want to make strict / partially strict.

rth · 2020-08-06T10:01:04Z

Let's merge now. I agree that it would be good to add a note on backward compatibility of check_estimator but that applies even without this PR. We can do that in a follow up PR, and we would need to make one anyway to use this feature more in common tests. I'm just looking forward to closing those 5 linked PRs :)

Thanks for your work on this!

NicolasHug · 2020-08-06T11:34:17Z

Thanks everyone for the reviews and feedback

jnothman · 2020-08-06T12:55:49Z

I meant developer docs where we talk about check_estimator and/or tags

jnothman · 2020-08-06T13:12:10Z

This is also missing from what's new.

I'm not really sure what the hurry was for merge without developer docs etc. It really needs to be in https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.

NicolasHug · 2020-08-06T13:18:59Z

Don't worry, I'm not forgetting the docs. As suggested in my message above I think they will be easier to write once we have a clearer idea of what checks are actually strict.

rth · 2020-08-06T13:29:14Z

Right now this PR doesn't do much as far as users are concerned. This is mostly an internal refactoring in the common test mechanism. Yes, we can document strict=False in "rolling-your-own-estimator" but currently it has mostly no impact and this description will have to be adjusted in any case once we actually start implementing it in common checks.

I'm not really sure what the hurry was for merge without developer docs etc

It's been 3 month since prototype 1 of this PR and we have been discussing this ever since. If the technical implementation is sound, I see no downsides of merging it so we can actually start using this in various common checks. We can't do that until this PR is merged.

jnothman · 2020-08-06T21:42:54Z

Okay. I just thought it unusual to merge a new feature workout docs. Thanks for explaining that you're waiting on more checks to be be designated as strict

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

NicolasHug added 12 commits May 16, 2020 10:49

treat strict checks as xfail checks

8775c1e

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

8c07ca6

…rict_mode_using_xfails_tag

different names

664552f

Comments

ecff04c

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

8e66d47

…rict_mode_using_xfails_tag

some clearning

c7c5c8d

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

e7d5f7c

…rict_mode_using_xfails_tag

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

ff09e3a

…rict_mode_xfails_partially_strict_checks

This is hard

a2b5bf5

put back reasons

2e2bebd

comments and cleaning

0a61d69

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

e1f1761

…rict_mode_xfails_partially_strict_checks

github-actions bot added the module:utils label May 27, 2020

NicolasHug commented May 27, 2020

View reviewed changes

NicolasHug changed the title ~~[WIP] Prototype 4 for strict check_estimator mode~~ [MRG] Prototype 4 for strict check_estimator mode May 27, 2020

typo

9c9757b

thomasjpfan reviewed Jun 1, 2020

View reviewed changes

check name in xfail message

92b45e1

rth reviewed Jun 4, 2020

View reviewed changes

rth mentioned this pull request Jun 6, 2020

All tests in check_estimator are disabled if X_types does not include "2darray" #14057

Open

rth added 3 commits July 10, 2020 11:21

Merge branch 'master' into strict_mode_xfails_partially_strict_checks

6c8af6a

Lint

a92320d

Lint

82584dc

rth mentioned this pull request Jul 17, 2020

TST Add sample order invariance to estimator_checks #17598

Closed

jnothman reviewed Aug 4, 2020

View reviewed changes

rth merged commit 1dedc7e into scikit-learn:master Aug 6, 2020

rth mentioned this pull request Aug 6, 2020

Add non-strict mode to check_estimator #13969

Closed

NicolasHug mentioned this pull request Sep 16, 2020

check_estimator is stricter than what is stated in the Estimator API doc #16241

Open

NicolasHug mentioned this pull request Oct 9, 2020

[MRG] MNT Api only mode in check_estimator #18582

Closed

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Oct 15, 2020

update mode in more-tags (reflecting scikit-learn#17361)

116ee39

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

[MRG] Prototype 4 for strict check_estimator mode (scikit-learn#17361)

a2cc10c

glemaitre mentioned this pull request Nov 25, 2020

MNT minimally revert strict_mode #18905

Merged

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 28, 2020

update mode in more-tags (reflecting scikit-learn#17361)

2b08f6b

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Dec 12, 2020

update mode in more-tags (reflecting scikit-learn#17361)

b2a2c2e

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021

update mode in more-tags (reflecting scikit-learn#17361)

8d0e409

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021

update mode in more-tags (reflecting scikit-learn#17361)

c923c30

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Mar 27, 2021

update mode in more-tags (reflecting scikit-learn#17361)

d17d100

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 15, 2021

update mode in more-tags (reflecting scikit-learn#17361)

944a66f

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jul 23, 2021

update mode in more-tags (reflecting scikit-learn#17361)

e8a0525

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

glemaitre mentioned this pull request Jul 26, 2021

TST split API checks from other checks #20608

Draft

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021

update mode in more-tags (reflecting scikit-learn#17361)

1d6fdb5

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request May 15, 2022

update mode in more-tags (reflecting scikit-learn#17361)

174cb0c

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 14, 2022

update mode in more-tags (reflecting scikit-learn#17361)

378456b

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 29, 2022

update mode in more-tags (reflecting scikit-learn#17361)

a658659

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Sep 5, 2022

update mode in more-tags (reflecting scikit-learn#17361)

0e015e0

see-also cross-reference in ocSVM and SVDD (reflecting scikit-learn#18332)

adrinjalali mentioned this pull request Aug 21, 2024

TST allow categorisation of tests into API and legacy #29699

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Prototype 4 for strict check_estimator mode #17361

[MRG] Prototype 4 for strict check_estimator mode #17361

NicolasHug commented May 26, 2020 •

edited by rth

Loading

NicolasHug May 27, 2020

NicolasHug May 27, 2020

NicolasHug commented May 27, 2020

thomasjpfan left a comment

thomasjpfan May 27, 2020

NicolasHug Jun 3, 2020

thomasjpfan Jun 3, 2020

thomasjpfan Jun 1, 2020

rth left a comment

rth Jun 4, 2020

NicolasHug Jun 4, 2020

rth Jun 4, 2020

rth Jun 4, 2020

NicolasHug commented Jun 4, 2020

rth commented Jun 4, 2020

NicolasHug commented Jun 4, 2020

rth commented Jul 11, 2020

jnothman left a comment

NicolasHug commented Aug 4, 2020

rth commented Aug 6, 2020

NicolasHug commented Aug 6, 2020

jnothman commented Aug 6, 2020 via email

jnothman commented Aug 6, 2020

NicolasHug commented Aug 6, 2020

rth commented Aug 6, 2020

jnothman commented Aug 6, 2020 via email

	return True, 'The check is fully strict and strict mode is off'
	return True, f'The {check_name} is fully strict and strict mode is off'

		check_name = (check.func.__name__ if isinstance(check, partial)
		else check.__name__)

	check_name = (check.func.__name__ if isinstance(check, partial)
	else check.__name__)
	check_name = _get_check_estimator_ids(check)

[MRG] Prototype 4 for strict check_estimator mode #17361

[MRG] Prototype 4 for strict check_estimator mode #17361

Conversation

NicolasHug commented May 26, 2020 • edited by rth Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented May 27, 2020

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Jun 4, 2020

rth commented Jun 4, 2020

NicolasHug commented Jun 4, 2020

rth commented Jul 11, 2020

jnothman left a comment

Choose a reason for hiding this comment

NicolasHug commented Aug 4, 2020

rth commented Aug 6, 2020

NicolasHug commented Aug 6, 2020

jnothman commented Aug 6, 2020 via email

jnothman commented Aug 6, 2020

NicolasHug commented Aug 6, 2020

rth commented Aug 6, 2020

jnothman commented Aug 6, 2020 via email

NicolasHug commented May 26, 2020 •

edited by rth

Loading