[WIP] Prototype 2 for strict check_estimator mode #16890

NicolasHug · 2020-04-10T16:23:32Z

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #16882 and #17252

Here we require every single check to have a strict_mode parameter, even if it's not used

ping @rth :)

NicolasHug · 2020-04-10T17:38:00Z

Souunds like this doesn't play well with tests marked with a _xfail_test tag.

I honestly don't understand how that works, help welcome @thomasjpfan

rth · 2020-04-10T18:00:22Z

sklearn/utils/estimator_checks.py

+        if strict_mode:
+            return check(*args, **kwargs)
+        else:
+            raise SkipTest(f"Skipping {check} since strict mode is off")


I think the issue may be that this decorator shouldn't raise skip test (which might not be handled well in a pytest session), but rather only mark the test as skipped in some way. And we can skip it in the same place we currently mark tests as xfail.

I'll try to have a more detailed look in a couple of days..

NicolasHug · 2020-04-10T22:59:51Z

The issue comes from _set_check_estimator_ids() which will return check_methods_subset_invariance(strict_mode=True), but the tag in (e.g.) BernoulliRBM is 'check_methods_subset_invariance' so they don't exactly match now.

So the fix is to change the tag value, for every single estimator.

Or is there a way we can change _mark_xfail_checks() to make it more robust?

thomasjpfan · 2020-04-14T17:58:52Z

The following is one way help with the keyword issue:

--- a/sklearn/utils/estimator_checks.py
+++ b/sklearn/utils/estimator_checks.py
@@ -290,7 +290,7 @@ def _yield_all_checks(name, estimator):
         yield check_fit_non_negative
 
 
-def _set_check_estimator_ids(obj):
+def _set_check_estimator_ids(obj, include_keywords=True):
     """Create pytest ids for checks.
 
     When `obj` is an estimator, this returns the pprint version of the
@@ -306,6 +306,9 @@ def _set_check_estimator_ids(obj):
     obj : estimator or function
         Items generated by `check_estimator`
 
+    include_keywords : bool, default=True
+        Includes keywords in id
+
     Returns
     -------
     id : string or None
@@ -318,7 +321,7 @@ def _set_check_estimator_ids(obj):
         if not isinstance(obj, partial):
             return obj.__name__
 
-        if not obj.keywords:
+        if not obj.keywords or not include_keywords:
             return obj.func.__name__
 
         kwstring = ",".join(["{}={}".format(k, v)
@@ -378,7 +381,7 @@ def _mark_xfail_checks(estimator, check, pytest):
     if not xfail_checks:
         return estimator, check
 
-    check_name = _set_check_estimator_ids(check)
+    check_name = _set_check_estimator_ids(check, include_keywords=False)
     msg = xfail_checks.get(check_name, None)
 
     if msg is None:

…rict_mode_function_level

NicolasHug · 2020-04-14T19:04:07Z

Thanks for the suggestion @thomasjpfan .

This seems to have fixed the previous issue, but I'm concerned that this patch will prevent the selection of xfail tests based on a parameter (especially the strict_mode parameter), e.g.:

    def _more_tags(self):
        return {
            '_xfail_test': {
                'some_check(strict_mode=true)':
                'the strict case fails but the non-strict one should be just fine'
            }
        }

thomasjpfan · 2020-04-15T17:54:55Z

Ah yes, then we would need to update _mark_xfail_checks to be a little smarter.

NicolasHug · 2020-04-17T12:07:08Z

thoughts @rth @jnothman ?

NicolasHug · 2020-04-17T12:16:26Z

And @rth @thomasjpfan, would it be better for the xfail tag logic to rely on partial functions instead of string representations? I.e. we would specify partial application of a check in the tag, instead of its test id.

rth · 2020-04-17T15:39:40Z

Thanks for working on this @NicolasHug . Sorry for slow review.

would it be better for the xfail tag logic to rely on partial functions instead of string representations? I.e. we would specify partial application of a check in the tag, instead of its test id.

Could you give an example? Do you mean, e.g.

          '_xfail_test': {
                partial(check_methods_subset_invariance, some_param=3), 
                'fails for the predict method'
            }

?
I guess technically partial functions are hashable and can serve as dict keys,

>>> def f(x, a=2):
...     return x + 1
... 
>>> from functools import partial
>>> partial(f, a=3)
functools.partial(<function f at 0x7f2366b2bdc0>, a=3)
>>> hash(partial(f, a=3))
8736876476520

I still find it a bit convoluted to use it that, particularly if check_.. can one day become decorated by pytest with some marks (and then I'm not sure if there is case where it's not hashable). But as a tuple (func, message) why not, though we would need to iterate though all of the items manually then. maybe comparison would be more reliable indeed, and it would ensure that the skipped check does indeed exist.
Though for contrib project that use this it means that we need to be really backward compatible for the check_ functions as that would break their imports.

So overall I have mixed feeling about it, possibly with +0.5. Not sure if it helps :)

rth · 2020-04-17T15:46:14Z

The issue comes from _set_check_estimator_ids() which will return check_methods_subset_invariance(strict_mode=True), but the tag in (e.g.) BernoulliRBM is 'check_methods_subset_invariance' so they don't exactly match now.

Just read the previous conversation in more details. If strict check is unpleasant to make work with xfail_checks as strings, then +1 to for partial functions.
As long as we can mitigate above concerns about hashability (i.e. maybe not use dict) and backward compatibility of check_* functions..

NicolasHug · 2020-04-18T15:02:47Z

BTW as @adrinjalali noted we should aim for a feature freeze soon. Considering that i) the solution for the strict mode will likely break backward compat of the xfail logic and that ii) we won't reach a solution within the timeframe, we should probably make very clear in the UG that the xfail_checks tag should basically not be used at all since it will change almost certainly. (I know tags are experimental, but still).

…rict_mode_function_level

rth · 2020-04-18T15:31:29Z

we won't reach a solution within the timeframe

You think we won't? Disabling stict checks is likely the no 1 ask from maintainers of contrib projects. If you need help on this I'd be happy to help.

Do we have a date for the feature freeze?

NicolasHug · 2020-04-18T15:37:38Z

Thanks for the help! I'll try to make it work with partial functions and will report back. If it's too tricky I guess relying on the generated check id would be fine.

Feature freeze is planned for the end of next week apparently.
But this PR is only the first part, i.e. implementing the strict mode mechanism. Once it's merged, we'll still have to do part 2, i.e. bikeshed on which tests should or should not be strict ;)

rth · 2020-04-18T15:39:39Z

Feature freeze is planned for the end of next week apparently.

Yes, saw the message on gitter. Will respond there.

But this PR is only the first part, i.e. implementing the strict mode mechanism. Once it's merged, we'll still have to do part 2, i.e. bikeshed on which tests should or should not be strict ;)

I know, but at least we'll have the basic mechanism.

jnothman

Is this much better than using an estimator tag to indicate strict mode?

NicolasHug · 2020-04-19T12:28:37Z

You mean an additional tag on top of xfail_checks?

With #16958, I'm starting to believe we need to rethink the xfail_checks tag to make it more general and also support a strict mode

jnothman · 2020-04-19T14:16:26Z

I suppose on top of xfail_checks, yes. strict is there to delineate design standards expected in scikit-learn from minimal expectations for any estimator. xfail_checks is intended for estimators that are necessarily unconventional in some way.

NicolasHug · 2020-04-19T14:25:09Z

Could you please help me understand with an example of how the tag would be used? Would it be something like

def _more_tags(self):
	return {
		'dont_execute_in_strict_mode': [check_1, check_2, ...]
	}

?

In any case, note that the changes proposed so far in this PR are orthogonal to the addition of a new tag.

Also, question for you: do you think we should allow checks to be partially strict? I.e. allow a check to have a strict section, and a non-strict section?

adrinjalali · 2020-04-20T13:20:41Z

Also, question for you: do you think we should allow checks to be partially strict? I.e. allow a check to have a strict section, and a non-strict section?

If that's the case for a test, should they not be two tests?

This also fixes #13969, doesn't it?

Putting this in 0.24 (although I'd be happy to have this in a minor release as well).

NicolasHug · 2020-04-20T13:22:57Z

I think we should not release xfail_tags in its current form in 0.23.

NicolasHug · 2020-04-20T13:34:25Z

@adrinjalali

If that's the case for a test, should they not be two tests?

I personally tend to agree with you. For context, the reason I'm asking is because if we don't need to have partially strict checks, then #16882 is enough. @rth seems to think it's worth having #16882 (comment). But I'd like to hear @jnothman's opinion too.

adrinjalali · 2020-04-20T13:42:47Z

I personally would be much happier with this solution, especially having #16756 in mind.

NicolasHug · 2020-04-20T13:50:09Z

Sorry, what do you mean by "this" solution?

rth · 2020-04-20T13:56:20Z

I agree we should avoid over-engineering this. We can apply strict at the check level, but this means e.g. that we cant assert error message in common tests. Basically all assert_raises_regex and assert_raise_message would then need to be removed, unless they assert messages from public helper functions (e.g. check_array or check_is_fitted).

I think we should not release xfail_tags in its current form in 0.23.

Just a reminder that "The estimator tags are experimental and the API is subject to change.".

Currently xfail_tags does address a valid use case, both internally and for contrib projects. Yes, API could potentially change in the future for xfail_tags due to interaction with strict=True which may be not ideal but I don't see a reason not to release it.

rth · 2020-04-20T14:00:47Z

We can apply strict at the check level,

Just to be clear, I'm not opposed to that solution, as long as we break checks in smaller pieces when needed. Technically they might be semi-public API though (or at least have been around enough time for contrib projects to build testing setups around them e.g. https://github.com/rtavenar/tslearn/blob/f87744a4d62578f4c82aead6f3ee4e9d51fd2a54/tslearn/tests/test_estimators.py#L40)

adrinjalali · 2020-04-20T14:20:28Z

Sorry, what do you mean by "this" solution?

Sorry, I meant as opposed to #16882

jnothman · 2020-04-27T13:19:06Z

Could you please help me understand with an example of how the tag would be used?

I just meant:

def _more_tags(self):
	return { 'strict': True }

Then all assertions which are for a standard we hold in sklearn but not required for general compatibility would skip if 'strict' is not.

NicolasHug · 2020-05-19T12:57:06Z

@jnothman I think that tuning on the strict mode should be a parameter to check_estimator, not a property of the estimator. One may want to run both the strict and the non-strict suite on the same instance.

Also please see the latest prototype at #17252

jnothman · 2020-05-20T12:23:49Z

One may want to run both the strict and the non-strict suite on the same instance.

In what case?

NicolasHug · 2020-05-20T12:25:29Z

in the case where we know an estimator doesn't pass the strict suite, but we still want to know which of the strict checks fail.

NicolasHug · 2020-05-20T12:28:15Z

Also note that changing the tag of an instance is basically unsupported right now. We only support setting tags on classes, as far as I can tell.

Having strict_mode as a tag would thus prevent us from using the strict suite with Est(some_param='a') and the non-strict suite with Est(some_param='b).

jnothman · 2020-05-20T12:50:16Z

Okay. I'm alright with strict not being a tag

Also note that changing the tag of an instance is basically unsupported right now. We only support setting tags on classes, as far as I can tell.

Untrue.

$ git grep -A3 more_tags | grep 'self\.'
sklearn/feature_extraction/_hash.py-        return {'X_types': [self.input_type]}
sklearn/feature_selection/_from_model.py-        estimator_tags = self.estimator._get_tags()
sklearn/feature_selection/_rfe.py-        estimator_tags = self.estimator._get_tags()
sklearn/impute/_base.py-        return {'allow_nan': is_scalar_nan(self.missing_values)}
sklearn/linear_model/_glm/glm.py-            _family_instance = self._family_instance
sklearn/preprocessing/_function_transformer.py-        return {'no_validation': not self.validate,

NicolasHug · 2020-05-20T12:59:49Z

I meant setting tags on the fly on an instance i.e. some_instance.set_tags(...), but agreed, supporting init-parameter-dependent tags would be enough.

NicolasHug added 4 commits April 9, 2020 09:53

prototype for strict mode

f7074fb

did LR raise a warning?

93956cb

different temporary check

baf9142

allow partially strict checks

31f698a

github-actions bot added the module:utils label Apr 10, 2020

rth reviewed Apr 10, 2020

View reviewed changes

NicolasHug added 2 commits April 14, 2020 14:35

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

bfa4f16

…rict_mode_function_level

Trying Thomas' suggestion

f54bf8c

NicolasHug mentioned this pull request Apr 14, 2020

[WIP] Prototype 1 for strict check_estimator mode #16882

Closed

NicolasHug mentioned this pull request Apr 17, 2020

MNT simplify xfail check marking logic #16949

Merged

NicolasHug added 2 commits April 18, 2020 11:26

Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…

f5d9324

…rict_mode_function_level

bad merge

f7e2700

use functions... WIP

62661e2

NicolasHug mentioned this pull request Apr 18, 2020

xfail_checks tag only works with parametrize_with_checks #16958

Closed

jnothman reviewed Apr 19, 2020

View reviewed changes

adrinjalali added this to the 0.24 milestone Apr 20, 2020

NicolasHug mentioned this pull request Apr 20, 2020

DOC details on the use of xfail_checks #16968

Merged

NicolasHug mentioned this pull request May 16, 2020

[MRG] Prototype 3 for strict check_estimator mode #17252

Closed

NicolasHug mentioned this pull request May 27, 2020

[MRG] Prototype 4 for strict check_estimator mode #17361

Merged

rth closed this in #17361 Aug 6, 2020

adrinjalali mentioned this pull request Aug 21, 2024

TST allow categorisation of tests into API and legacy #29699

Merged

Uh oh!

[WIP] Prototype 2 for strict check_estimator mode #16890

[WIP] Prototype 2 for strict check_estimator mode #16890

Uh oh!

Conversation

NicolasHug commented Apr 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Apr 10, 2020

Uh oh!

rth Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Apr 10, 2020

Uh oh!

thomasjpfan commented Apr 14, 2020 • edited by NicolasHug Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Apr 15, 2020

Uh oh!

NicolasHug commented Apr 17, 2020

Uh oh!

NicolasHug commented Apr 17, 2020

Uh oh!

rth commented Apr 17, 2020

Uh oh!

rth commented Apr 17, 2020

Uh oh!

NicolasHug commented Apr 18, 2020

Uh oh!

rth commented Apr 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Apr 18, 2020

Uh oh!

rth commented Apr 18, 2020

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Apr 19, 2020

Uh oh!

jnothman commented Apr 19, 2020 via email

Uh oh!

NicolasHug commented Apr 19, 2020

Uh oh!

adrinjalali commented Apr 20, 2020

Uh oh!

NicolasHug commented Apr 20, 2020

Uh oh!

NicolasHug commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Apr 20, 2020

Uh oh!

NicolasHug commented Apr 20, 2020

Uh oh!

rth commented Apr 20, 2020

Uh oh!

rth commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Apr 20, 2020

Uh oh!

jnothman commented Apr 27, 2020

Uh oh!

NicolasHug commented May 19, 2020

Uh oh!

jnothman commented May 20, 2020

Uh oh!

NicolasHug commented May 20, 2020

Uh oh!

NicolasHug commented May 20, 2020

Uh oh!

jnothman commented May 20, 2020

Uh oh!

NicolasHug commented May 20, 2020

NicolasHug commented Apr 10, 2020 •

edited

Loading

thomasjpfan commented Apr 14, 2020 •

edited by NicolasHug

Loading

NicolasHug commented Apr 14, 2020 •

edited

Loading

rth commented Apr 18, 2020 •

edited

Loading

NicolasHug commented Apr 20, 2020 •

edited

Loading

rth commented Apr 20, 2020 •

edited

Loading